Method

ABSTRACT

The invention relates to a method for modifying a template double stranded polynucleotide, especially for characterisation using nanopore sequencing. The method produces from the template a plurality of modified double stranded polynucleotides. These modified polynucleotides can then be characterised.

FIELD OF THE INVENTION

The invention relates to a method for modifying a template doublestranded polynucleotide, especially for characterisation using nanoporesequencing. The method produces from the template a plurality ofmodified double stranded polynucleotides. These modified polynucleotidescan then be characterised.

BACKGROUND OF THE INVENTION

There are many commercial situations which require the preparation of anucleic acid library. This is frequently achieved using a transposase.Depending on the transposase which is used to prepare the library it maybe necessary to repair the transposition events in vitro before thelibrary can be used, for example in sequencing.

There is currently a need for rapid and cheap polynucleotide (e.g. DNAor RNA) sequencing and identification technologies across a wide rangeof applications. Existing technologies are slow and expensive mainlybecause they rely on amplification techniques to produce large volumesof polynucleotide and require a high quantity of specialist fluorescentchemicals for signal detection.

Transmembrane pores (nanopores) have great potential as direct,electrical biosensors for polymers and a variety of small molecules. Inparticular, recent focus has been given to nanopores as a potential DNAsequencing technology.

When a potential is applied across a nanopore, there is a change in thecurrent flow when an analyte, such as a nucleotide, resides transientlyin the barrel for a certain period of time. Nanopore detection of thenucleotide gives a current change of known signature and duration. Inthe strand sequencing method, a single polynucleotide strand is passedthrough the pore and the identity of the nucleotides are derived. Strandsequencing can involve the use of a polynucleotide binding protein tocontrol the movement of the polynucleotide through the pore.

SUMMARY OF THE INVENTION

The inventors have surprisingly demonstrated that it is possible tomodify a template double stranded polynucleotide to produce a pluralityof shorter, modified double stranded polynucleotides. The modifieddouble stranded polynucleotides may include, for instance, a hairpinloop or a single stranded leader sequence. These modifications can bedesigned such that the modified double stranded polynucleotides are eacheasier to characterise, such as by strand sequencing, than the originaltemplate polynucleotide. Subsequent characterisation of the modifiedpolynucleotides allows the character of the template polynucleotide tobe more easily determined.

The modification method of the invention uses a MuA transposase, apopulation of MuA substrates and a polymerase and is summarised in FIG.1 . The MuA substrates comprise an overhang and a hairpin loop onopposite strands. The MuA transposase is capable of fragmenting thetemplate polynucleotide and producing fragments with overhangs at bothends. The MuA transposase is also capable of ligating a substrate to theoverhang at one or both ends of the double stranded fragments. Thestrand of the substrate without an overhang is typically ligated to thestrand of the fragment with an overhang. This leaves a single strandedgap in the resulting double stranded construct. The double strandedconstruct also has a hairpin loop on the opposite strand from the gap.

The polymerase is capable of using the strand comprising the hairpinloop as a template and displacing the strand containing the singlestranded gap. The resulting double stranded construct contains twocomplementary strands containing a fragment of the templatepolynucleotide. The two strands in this construct can be separated and,preferably simultaneously, used as templates to produce two doublestranded constructs which comprise a fragment of the templatepolynucleotide and in which the two strands are linked by a hairpinloop.

Accordingly, the invention provides a method for modifying a templatedouble stranded polynucleotide, comprising:

(a) contacting the template polynucleotide with a MuA transposase and apopulation of double stranded MuA substrates each comprising (i) atleast one overhang and (ii) at least one hairpin loop in the oppositestrand from the strand comprising the at least one overhang such thatthe transposase fragments the template polynucleotide and ligates asubstrate to one or both ends of the double stranded fragments andthereby produces a plurality of fragment/substrate constructs;

(b) contacting the fragment/substrate constructs with a polymerase suchthat the polymerase displaces the strands comprising the overhangs andreplaces them with strands which complement the strands comprising thehairpin loops and thereby produces a plurality of double strandedconstructs each comprising a double stranded fragment of the templatepolynucleotide; and

(c) separating the two strands of the double stranded constructs andusing the strands as templates to form a plurality of modified doublestranded polynucleotides each comprising two complementary strandslinked by at least one hairpin loop.

The invention also provides:

a plurality of modified double stranded polynucleotides produced usingthe method of the invention;

a population of double stranded polynucleotide MuA substrates formodifying a template polynucleotide, wherein the substrates are asdefined above;

a method of characterising at least one polynucleotide modified using amethod of the invention, comprising:

a) contacting the modified polynucleotide with a transmembrane pore suchthat at least one strand of the polynucleotide moves through the pore;and

b) taking one or more measurements as the at least one strand moves withrespect to the pore wherein the measurements are indicative of one ormore characteristics of the at least one strand and therebycharacterising the modified polynucleotide;

a method of characterising a template polynucleotide, comprising:

a) modifying the template polynucleotide using the method of theinvention to produce a plurality of modified polynucleotides:

b) contacting each modified polynucleotide with a transmembrane poresuch that at least one strand of each polynucleotide moves through thepore; and

c) taking one or more measurements as each polynucleotide moves withrespect to the pore wherein the measurements are indicative of one ormore characteristics of each polynucleotide and thereby characterisingthe template polynucleotide; and

a kit for modifying a template double stranded polynucleotide comprising(a) a population of MuA substrates as defined above, (b) a MuAtransposase and (c) a polymerase.

DESCRIPTION OF THE FIGURES

FIG. 1 shows a cartoon representation of a method of modifying atemplate double-stranded polynucleotide (labelled a). Step 1 involvedcontacting a template double-stranded polynucleotide with a MuAtransposase (labelled b) and a population of double-stranded MuAsubstrates (labelled c, the double stranded MuA substrates eachcontained a 5′ hairpin loop) so that the MuA transposase fragmented thetemplate double-stranded polynucleotide and inserted the MuA substratesat each side of the point of fragmentation. Step 2 involved treating thetemplate strand with a polymerase (labelled e) and dNTPs which displacedthe DNA fragments labelled d and produced complementary strands to theDNA 5′ hairpin loop. Step 3 involved heat treatment of thedouble-stranded DNA construct labelled f so that the strands weredenatured into single-stranded DNA (labelled g). Finally, step 4involved a DNA polymerase forming the complementary strand.

FIG. 2 shows a cartoon representation of the method of modifying atemplate double-stranded polynucleotide (labelled a) outlined inExample 1. Step 1 involved contacting a template double-strandedpolynucleotide with a MuA transposase (labelled b) and a population ofdouble-stranded MuA substrates (labelled c, the double stranded MuAsubstrates each contained a 5′ hairpin loop) so that the MuA transposasefragmented the template double-stranded polynucleotide and inserted theMuA substrates at each side of the point of fragmentation. Step 2involved treating the template strand with a polymerase (labelled e) anddNTPs which displaced the DNA fragments labelled d and producedcomplementary strands to the DNA 5′ hairpin loop. Step 3 involved heattreatment of the double-stranded DNA construct labelled f so that thestrands were denatured into single-stranded DNA (labelled g). Step 4involved a second treatment with a DNA polymerase which formed thecomplementary strand. Finally, step 5 involved dA-tailing of thedouble-stranded DNA construct produced in step 4, ligation of an adapterwhich had an enzyme (labelled h) pre-bound and hybridisation of a DNAstrand (labelled i) which contained a cholesterol tether (labelled j).This produced the final DNA construct which was tested in the nanoporesystem described in Example 1.

FIG. 3 shows an example current trace (y-axis label=Current (pA), x-axislabel=Time (s)) of when a helicase (T4 Dda-E94C/C109A/C136A/A360C (SEQID NO: 24 with mutations E94C/C109A/C136A/A360C)) controlled thetranslocation of DNA sample 6 through an MspA nanopore.

FIG. 4 shows an Agilent 12,000 DNA chip trace. The line labelled l wasthe untreated MuA fragmented DNA input material, the line labelled 2 wasthe analyte that had the 68° C. incubation step (in 1.2 of Example 1)and subsequently had undergone all of step 1.3 of Example 1 and the linelabelled 3 did not have the 68° C. incubation in step 1.2 of Example 1but had undergone all of step 1.3 of Example 1. Region X corresponded tothe double-stranded DNA library, region Y corresponded to the uppermarker of the Agilent 12,000 and region Z corresponded to the lowermarker of the Agilent 12,000 chip.

FIG. 5 shows a cartoon representation of a preferred method of modifyinga template double-stranded polynucleotide (labelled a). FIG. 5 isidentical to FIG. 1 , except that each substrate comprises a leadersequence (labelled i) separated from the hairpin loop by a spacer (xxx;labelled h). The leader sequence was not used as a template because thepolymerase (labelled e) could not move past the spacer.

FIG. 6 shows an example current trace (y-axis label=Current (pA), x-axislabel=Time (s)) of when a helicase (T4 Dda-E94C/C109A/C136A/A360C)controlled the translocation of DNA sample 7 through an MspA nanopore.

FIG. 7 shows a cartoon representation of a method of modifying atemplate double stranded polynucleotide (labelled a). Step 1 involvedcontacting a template double-stranded polynucleotide with a MuAtransposase (labelled b) and a population of double-stranded MuAsubstrates (labelled C, where the double stranded MuA substrates eachcontained a 5′ hairpin loop which contained I/Z's in the hairpin(labelled h and shown as black circles) which replaced the G/C's) sothat the MuA transposase fragmented the template double-strandedpolynucleotide and inserted the MuA substrates at each side of the pointof fragmentation. Step 2 involved treating the template strand with apolymerase (labelled e) and dNTPs which displaced the DNA fragmentslabelled d and produced complementary strands to the DNA 5′ hairpin loop(dsDNA produced was labelled f). The double stranded region (labelled1X) formed by the polymerase is made up of two strands which are bothcapable of forming a hairpin loop. The hairpin loop formed by strand F2has a higher Tm than the Tm of the double stranded region 1X, this isbecause strand F2's hairpin loop is made up of C/T/A/G and the doublestranded region 1X is strand f2 hybridised to strand f1 where strand F1is made up of ZTI/A/I (and Z and I only form two hydrogen bonds whereasC/G form 3 hydrogen bonds). Therefore, F2 forms a hairpin loop (labelledf2h) and F1 forms a hairpin loop (labelled f1h), the hairpin loop formedby strand F1 has a higher Tm than the hairpin loop formed by strand F2.The DNA polymerase was then able to produce complementary strands shownas a dash/dotted line (the entire dsDNA construct labelled i1 and i2).Therefore, the polymerase was able to form a complementary strand (shownas a dashed/dotted line) without needing to heat the dsDNA produced instep 2 (and labelled f1 hybridised to f2).

FIG. 8 shows a cartoon representation of a preferred method of theinvention. Steps 1 to 4 were the same as in FIG. 1 . Step 5 involvedadding a hairpin loop to the construct formed in FIG. 1 . Step 6involved heat treatment of the modified double-stranded polynucleotideso that the strands were denatured into single-stranded construct.Finally, step 7 involved a DNA polymerase forming the complementarystrand.

DESCRIPTION OF THE SEQUENCE LISTING

SEQ ID NO: 1 shows the codon optimised polynucleotide sequence encodingthe MS-B1 mutant MspA monomer. This mutant lacks the signal sequence andincludes the following mutations: D90N, D91N. D93N, D118R, D134R andE139K.

SEQ ID NO: 2 shows the amino acid sequence of the mature form of theMS-B1 mutant of the MspA monomer. This mutant lacks the signal sequenceand includes the following mutations: D90N, D91N, D93N, D118R, D134R andE139K.

SEQ ID NO: 3 shows the polynucleotide sequence encoding one monomer ofα-hemolysin-E111N/K147N (α-HL-NN; Stoddart et al., PNAS, 2009; 106(19):7702-7707).

SEQ ID NO: 4 shows the amino acid sequence of one monomer of α-HL-NN.

SEQ ID NOs: 5 to 7 show the amino acid sequences of MspB, C and D.

SEQ ID NO: 8 shows the polynucleotide sequence encoding the Phi29 DNApolymerase.

SEQ ID NO: 9 shows the amino acid sequence of the Phi29 DNA polymerase.

SEQ ID NO: 10 shows the codon optimised polynucleotide sequence derivedfrom the sbcB gene from E. coli. It encodes the exonuclease I enzyme(EcoExo I) from E. coli.

SEQ ID NO: 11 shows the amino acid sequence of exonuclease I enzyme(EcoExo I) from E. coli.

SEQ ID NO: 12 shows the codon optimised polynucleotide sequence derivedfrom the xthA gene from E. coli. It encodes the exonuclease III enzymefrom E. coli.

SEQ ID NO: 13 shows the amino acid sequence of the exonuclease 111enzyme from E. coli. This enzyme performs distributive digestion of 5′monophosphate nucleosides from one strand of double stranded DNA (dsDNA)in a 3′-5′ direction. Enzyme initiation on a strand requires a 5′overhang of approximately 4 nucleotides.

SEQ ID NO: 14 shows the codon optimised polynucleotide sequence derivedfrom the recJ gene from T. thermophilus. It encodes the RecJ enzyme fromT. thermophilus (TthRecJ-cd).

SEQ ID NO: 15 shows the amino acid sequence of the RecJ enzyme from T.thermophilus (TthRecJ-cd). This enzyme performs processive digestion of5′ monophosphate nucleosides from ssDNA in a 5′-3′ direction. Enzymeinitiation on a strand requires at least 4 nucleotides.

SEQ ID NO: 16 shows the codon optimised polynucleotide sequence derivedfrom the bacteriophage lambda exo (redX) gene. It encodes thebacteriophage lambda exonuclease.

SEQ ID NO: 17 shows the amino acid sequence of the bacteriophage lambdaexonuclease. The sequence is one of three identical subunits thatassemble into a trimer. The enzyme performs highly processive digestionof nucleotides from one strand of dsDNA, in a 5′-3′ direction(http://www.neb.com/nebecomm/products/productM0262.asp). Enzymeinitiation on a strand preferentially requires a 5′ overhang ofapproximately 4 nucleotides with a 5′ phosphate.

SEQ ID NO: 18 shows the amino acid sequence of Hel308 Mbu.

SEQ ID NO: 19 shows the amino acid sequence of Hel308 Csy.

SEQ ID NO: 20 shows the amino acid sequence of Hel308 Tga.

SEQ ID NO: 21 shows the amino acid sequence of Hel308 Mhu.

SEQ ID NO: 22 shows the amino acid sequence of TraI Eco.

SEQ ID NO: 23 shows the amino acid sequence of XPD Mbu.

SEQ ID NO: 24 shows the amino acid sequence of Dda 1993.

SEQ ID NO: 25 shows the amino acid sequence of Trwc Cba.

SEQ ID NOs: 26 to 28 show the sequences of preferred MuA substrates ofthe invention.

SEQ ID NO: 29 shows a polynucleotide sequence used in Example 1.

SEQ ID NO: 30 shows a polynucleotide sequence used in Example 1. Thissequence has the following polynucleotide sequence attached at its 5′end-GATCU.

SEQ ID NO: 31 shows the polynucleotide sequence, used in Example 1, ofthe Enterobacteria phage λ. The sequence contains an additional 12 baseoverhang attached at the 5′ end of the template strand. The sequenceshown here is that of the template strand only (the template complementis not shown).

SEQ ID NO: 32 shows a polynucleotide sequence used in Example 1.

SEQ ID NO: 33 shows a polynucleotide sequence used in Example 1.

SEQ ID NO: 34 shows a polynucleotide sequence used in Example 1.

SEQ ID NO: 35 shows a polynucleotide sequence used in Example 1.

SEQ ID NO: 36 shows a polynucleotide sequence used in Example 2.

SEQ ID NO: 37 shows a polynucleotide sequence used in Example 2.

DETAILED DESCRIPTION OF THE INVENTION

It is to be understood that different applications of the disclosedproducts and methods may be tailored to the specific needs in the art.It is also to be understood that the terminology used herein is for thepurpose of describing particular embodiments of the invention only, andis not intended to be limiting.

In addition as used in this specification and the appended claims, thesingular forms “a”, “an”, and “the” include plural referents unless thecontent clearly dictates otherwise. Thus, for example, reference to “apolynucleotide” includes “polynucleotides”, reference to “a substrate”includes two or more such substrates, reference to “a transmembraneprotein pore” includes two or more such pores, and the like.

All publications, patents and patent applications cited herein, whethersupra or infra, are hereby incorporated by reference in their entirety.

Modification Method of the Invention

The present invention provides a method of modifying a templatepolynucleotide. The template may be modified for any purpose. The methodis preferably for modifying a template polynucleotide forcharacterisation, such as for strand sequencing. The templatepolynucleotide is typically the polynucleotide that will ultimately becharacterised, or sequenced, in accordance with the invention. This isdiscussed in more detail below.

The method involves the formation of a plurality of modified doublestranded polynucleotides. These modified double stranded polynucleotidesare typically easier to characterise than the template polynucleotide,especially using strand sequencing. The plurality of modified doublestranded polynucleotides may themselves be characterised in order tofacilitate the characterisation of the template polynucleotide. Forinstance, the sequence of the template polynucleotide can be determinedby sequencing each of the modified double stranded polynucleotides.

The modified double stranded polynucleotides are typically shorter thanthe template polynucleotide and so it is more straightforward tocharacterise them using strand sequencing. The modified double strandedpolynucleotides also includes double the amount of information asdiscussed below.

The modified double strand polynucleotides can be selectively labelledby including the labels in the MuA substrates. Suitable labels include,but are not limited to, calibration sequences, coupling moieties andadaptor bound enzymes.

In some embodiments, the method introduces into the double strandedpolynucleotides modifications which facilitate their characterisationusing strand sequencing. It is well-established that coupling apolynucleotide to the membrane containing the nanopore lowers by severalorders of magnitude the amount of polynucleotide required to allow itscharacterisation or sequencing. This is discussed in InternationalApplication No. PCT/GB2012/051191 (published as WO 2012/164270). Themethod of the invention allows the production of a plurality of doublestranded polynucleotides each of which includes a means for coupling thepolynucleotides to a membrane. This is discussed in more detail below.

The characterisation of double stranded polynucleotides using a nanoporetypically requires the presence of a leader sequence designed topreferentially thread into the nanopore. The method of the inventionallows the production of a plurality of double stranded polynucleotideseach of which includes a single stranded leader sequence. This isdiscussed in more detail below.

It is also well established that linking the two strands of a doublestranded polynucleotide by a bridging moiety, such as hairpin loop,allows both strands of the polynucleotide to be characterised orsequenced by a nanopore. This is advantageous because it doubles theamount of information obtained from a single double strandedpolynucleotide. Moreover, because the sequence in the templatecomplement strand is necessarily orthogonal to the sequence of thetemplate strand, the information from the two strands can be combinedinformatically. Thus, this mechanism provides an orthogonalproof-reading capability that provides higher confidence observations.This is discussed in International Application No. PCT/GB2012/051786(published as WO 2013/014451). The method of the invention allows theproduction of a plurality of modified double stranded polynucleotides inwhich the two strands of each polynucleotide are linked using a hairpinloop.

Template Polynucleotide

The method of the invention modifies a template double strandedpolynucleotide, preferably for characterisation. The templatepolynucleotide is typically the polynucleotide that will ultimately becharacterised, or sequenced, in accordance with the invention. It mayalso be called the target double stranded polynucleotide or the doublestranded polynucleotide of interest.

A polynucleotide, such as a nucleic acid, is a macromolecule comprisingtwo or more nucleotides. The polynucleotide or nucleic acid may compriseany combination of any nucleotides. The nucleotides can be naturallyoccurring or artificial. One or more nucleotides in the polynucleotidecan be oxidized or methylated. One or more nucleotides in thepolynucleotide may be damaged. For instance, the polynucleotide maycomprise a pyrimidine dimer. Such dimers are typically associated withdamage by ultraviolet light and are the primary cause of skin melanomas.One or more nucleotides in the polynucleotide may be modified, forinstance with a label or a tag. Suitable labels are described below. Thepolynucleotide may comprise one or more spacers.

A nucleotide typically contains a nucleobase, a sugar and at least onephosphate group. The nucleobase and sugar form a nucleoside.

The nucleobase is typically heterocyclic. Nucleobases include, but arenot limited to, purines and pyrimidines and more specifically adenine(A), guanine (G), thymine (T), uracil (U) and cytosine (C).

The sugar is typically a pentose sugar. Nucleotide sugars include, butare not limited to, ribose and deoxyribose. The sugar is preferably adeoxyribose.

The polynucleotide preferably comprises the following nucleosides:deoxyadenosine (dA), deoxyuridine (dU) and/or thymidine (dT),deoxyguanosine (dG) and deoxycytidine (dC).

The nucleotide is typically a ribonucleotide or deoxyribonucleotide. Thenucleotide typically contains a monophosphate, diphosphate ortriphosphate. The nucleotide may comprise more than three phosphates,such as 4 or 5 phosphates. Phosphates may be attached on the 5′ or 3′side of a nucleotide. Nucleotides include, but are not limited to,adenosine monophosphate (AMP), guanosine monophosphate (GMP), thymidinemonophosphate (TMP), uridine monophosphate (UMP), 5-methylcytidinemonophosphate, 5-hydroxymethylcytidine monophosphate, cytidinemonophosphate (CMP), cyclic adenosine monophosphate (cAMP), cyclicguanosine monophosphate (cGMP), deoxyadenosine monophosphate (dAMP),deoxyguanosine monophosphate (dGMP), deoxythymidine monophosphate(dTMP), deoxyuridine monophosphate (dUMP), deoxycytidine monophosphate(dCMP) and deoxymethylcytidine monophosphate. The nucleotides arepreferably selected from AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP, dCMPand dUMP.

A nucleotide may be abasic (i.e. lack a nucleobase). A nucleotide mayalso lack a nucleobase and a sugar (i.e. is a C3 spacer).

The nucleotides in the polynucleotide may be attached to each other inany manner. The nucleotides are typically attached by their sugar andphosphate groups as in nucleic acids. The nucleotides may be connectedvia their nucleobases as in pyrimidine dimers.

The polynucleotide is double stranded. At least a portion of thepolynucleotide is preferably double stranded.

The polynucleotide can be a nucleic acid, such as deoxyribonucleic acid(DNA) or ribonucleic acid (RNA). The polynucleotide can comprise onestrand of RNA hybridised to one strand of DNA. The polynucleotide may beany synthetic nucleic acid known in the art, such as peptide nucleicacid (PNA), glycerol nucleic acid (GNA), threose nucleic acid (TNA),locked nucleic acid (LNA) or other synthetic polymers with nucleotideside chains. The PNA backbone is composed of repeatingN-(2-aminoethyl)-glycine units linked by peptide bonds. The GNA backboneis composed of repeating glycol units linked by phosphodiester bonds.The TNA backbone is composed of repeating threose sugars linked togetherby phosphodiester bonds. LNA is formed from ribonucleotides as discussedabove having an extra bridge connecting the 2′ oxygen and 4′ carbon inthe ribose moiety.

The polynucleotide is most preferably ribonucleic nucleic acid (RNA) ordeoxyribonucleic acid (DNA).

The polynucleotide can be any length. For example, the polynucleotidecan be at least 10, at least 50, at least 100, at least 150, at least200, at least 250, at least 300, at least 400 or at least 500nucleotides or nucleotide pairs in length. The polynucleotide can be1000 or more nucleotides or nucleotide pairs, 5000 or more nucleotidesor nucleotide pairs in length or 100000 or more nucleotides ornucleotide pairs in length.

Any number of polynucleotides can be investigated using the invention.For instance, the invention may concern characterising 2, 3, 4, 5, 6, 7,8, 9, 10, 20, 30, 50, 100 or more polynucleotides. If two or morepolynucleotides are characterized, they may be different polynucleotidesor two instances of the same polynucleotide.

The polynucleotide can be naturally occurring or artificial. Forinstance, the method may be used to verify the sequence of amanufactured oligonucleotide. The method is typically carried out invitro.

The template polynucleotide is typically present in any suitable sample.The invention is typically carried out on a sample that is known tocontain or suspected to contain the template polynucleotide.Alternatively, the invention may be carried out on a sample to confirmthe identity of one or more template polynucleotides whose presence inthe sample is known or expected.

The sample may be a biological sample. The invention may be carried outin vitro on a sample obtained from or extracted from any organism ormicroorganism. The organism or microorganism is typically archaeal,prokaryotic or eukaryotic and typically belongs to one of the fivekingdoms: plantae, animalia, fungi, monera and protista. The inventionmay be carried out in vitro on a sample obtained from or extracted fromany virus. The sample is preferably a fluid sample. The sample typicallycomprises a body fluid of the patient. The sample may be urine, lymph,saliva, mucus or amniotic fluid but is preferably blood, plasma orserum. Typically, the sample is human in origin, but alternatively itmay be from another mammal animal such as from commercially farmedanimals such as horses, cattle, sheep or pigs or may alternatively bepets such as cats or dogs. Alternatively a sample of plant origin istypically obtained from a commercial crop, such as a cereal, legume,fruit or vegetable, for example wheat, barley, oats, canola, maize,soya, rice, bananas, apples, tomatoes, potatoes, grapes, tobacco, beans,lentils, sugar cane, cocoa, cotton.

The sample may be a non-biological sample. The non-biological sample ispreferably a fluid sample. Examples of a non-biological sample includesurgical fluids, water such as drinking water, sea water or river water,and reagents for laboratory tests.

The sample is typically processed prior to being used in the invention,for example by centrifugation or by passage through a membrane thatfilters out unwanted molecules or cells, such as red blood cells. Thesample may be measured immediately upon being taken. The sample may alsobe typically stored prior to assay, preferably below −70° C.

MuA and Conditions

The template polynucleotide is contacted with a MuA transposase. Thiscontacting occurs under conditions which allow the transposase tofunction, i.e. to fragment the template polynucleotide and to ligate MuAsubstrates to the one or both ends of the fragments. MuA transposase iscommercially available, for instance from Thermo Scientific (CatalogueNumber F-750C, 20 μL (1.1 μg/μL)). Conditions under which MuAtransposase will function are known in the art. Suitable conditions aredescribed in the Examples.

Population of Substrates

The template polynucleotide is contacted with a population of doublestranded MuA substrates. The double stranded substrates arepolynucleotide substrates and may be formed from any of the nucleotidesor nucleic acids discussed above. The substrates are typically formedfrom the same nucleotides as the template polynucleotide.

The population of substrates is typically homogenous (i.e. typicallycontains a plurality of identical substrates). The population ofsubstrates may be heterogeneous (i.e. may contain a plurality ofdifferent substrates).

Suitable substrates for a MuA transposase are known in the art (Saariahoand Savilahti, Nucleic Acids Research, 2006; 34(10): 3139-3149 and Leeand Harshey, J. Mol. Biol., 2001; 314: 433-444).

Each substrate typically comprises a double stranded portion whichprovides its activity as a substrate for MuA transposase. The doublestranded portion is typically the same in each substrate. The populationof substrates may comprise different double stranded portions.

The double stranded portion in each substrate is typically at least 50nucleotide pairs in length, such as at least 55, at least 60 or at least65 nucleotide pairs in length. The double stranded portion in eachsubstrate preferably comprises a dinucleotide comprising deoxycytidine(dC) and deoxyadenosine (dA) at the 3′ end of each strand. The dC and dAare typically in different orientations in the two strands of the doublestranded portion, i.e. one strand has dC/dA and the other strand hasdA/dC at the 3′ end when reading from 5′ to 3′.

One strand of the double stranded portion preferably comprises thesequence shown in SEQ ID NO: 26 and the other strand of the doublestranded portion preferably comprises the sequence shown in SEQ ID NO:27.

(SEQ 26) 5′-GTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCCGCT TCA-3′(SEQ 27) 3′-CAAAAGCGTAAATAGCACTTTGCGAAAGCGCAAAAAGCACGCGGCGA AGT-5′

Each substrate comprises at least one overhang. The overhang istypically a nucleotide overhang. There may be an overhang at one or bothends of each substrate. If the double stranded portion in each substratecomprises the sequence shown in SEQ ID NO: 26 hybridised to the sequenceshown in SEQ ID NO: 27, the at least one overhang is preferably at the5′ end of the sequence shown in SEQ ID NO: 27.

Each substrate may comprise two overhangs, i.e. one at both ends of eachsubstrate. If there is an overhang at both ends of a substrate, eachoverhang is typically on different strands of the double strandedpolynucleotide portion. Overhangs are preferably located at the 5′ endof a strand of the double stranded portion.

Each substrate preferably comprises only one overhang. The only oneoverhang is preferably at the 5′ end of one strand of the doublestranded portion.

The overhang may be at least 3, at least 4, at least 5, at least 6 or atleast 7 nucleotides in length. The overhang is preferably 5 nucleotidesin length.

In a preferred embodiment, one strand of the substrate comprises thesequence shown in SEQ ID NO: 26 and the other strand of the substratecomprises the sequence shown in SEQ ID NO: 28 (see below).

(SEQ 26) 5′-GTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCCGCT TCA-3′(SEQ 28) 3′-CAAAAGCGTAAATAGCACTTTGCGAAAGCGCAAAAAGCACGCGGCGA AGTCTAG-5′

The substrates in the population may have any of the structuresdisclosed in International Application No. PCT/GB2014/052505.

Each substrate comprises at least one hairpin loop in the oppositestrand from the strand comprising the at least one overhang. The hairpinloop typically does not link the two strands of the substrate. Thehairpin loop may be an internal hairpin loop, i.e. not at the end of theopposite strand from the strand comprising the at least one overhang.Internal hairpin loops are preferably adjacent to a spacer past whichany polymerase used in the method of the invention cannot move. Thespacer may be located on either side of the hairpin loop. Any of thespacers discussed below, such as one or more iSpC3 groups (i.e.nucleotides which lack sugar and a base), one or more spacer 9 (iSp9)groups or one or more spacer 18 (iSp18) groups, may be used. Internalhairpin loops are preferably adjacent to non-natural nucleotides, suchas nitroindoles, past which any polymerase used in the method of theinvention cannot move. Any of the different nucleotide species discussedbelow may be used.

The hairpin loop is preferably at or near the end of the opposite strandfrom the strand comprising the at least one overhang. The hairpin loopis near the end of the opposite strand from the strand comprising the atleast one overhang if it is 20 nucleotides or fewer, 15 nucleotides orfewer, 10 nucleotides or fewer or 5 nucleotides or fewer from the end ofthe opposite strand from the strand comprising the at least oneoverhang. The hairpin loop is 20 nucleotides or fewer from the end ofthe strand if there are 20 nucleotides or fewer between the lastnucleotide forming the stem part (hybridised part) of the end of thestrand. The hairpin loop is preferably at the end of the opposite strandfrom the strand comprising the at least one overhang. There may be ahairpin loop at one or both ends of each substrate. The hairpin loop ispreferably at the opposite end of the substrate from the at least oneoverhang.

The hairpin loop is typically a nucleotide hairpin loop. If the doublestranded portion in each substrate comprises the sequence shown in SEQID NO: 26 hybridised to the sequence shown in SEQ ID NO: 27, the atleast one hairpin loop is preferably at the 5′ end of the sequence shownin SEQ ID NO: 26.

Each substrate may comprise two hairpin loops, i.e. one in both strandsof each substrate or one at both ends of each substrate. If there is ahairpin loop at both ends of a substrate, each hairpin loop is typicallyon different strands of the double stranded polynucleotide portion.Hairpin loops are preferably located at the 5′ end of a strand of thedouble stranded portion.

Each substrate preferably comprises only one hairpin loop. The only onehairpin loop is preferably in the opposite strand from the strandcomprising the at least one overhang. The only one hairpin loop ispreferably at the opposite end of the substrate from the at least oneoverhang and in the opposite strand from the strand comprising the atleast one overhang. The only one hairpin loop is preferably at the 5′end of one strand of the double stranded portion and in the oppositestrand from the strand comprising the at least one overhang.

In a preferred embodiment, each substrate comprises one overhang at the5′ end of one strand of the double stranded portion and a hairpin loopat the 5′ end of the other strand of the double stranded portion. In amost preferred embodiment, one strand of the substrate comprises thesequence shown in SEQ ID NO: 26 and the other strand of the substratecomprises the sequence shown in SEQ ID NO: 28 (see above) and thehairpin loop is at the 5′ end of the sequence shown in SEQ ID NO: 26.

Suitable hairpin loops can be designed using methods known in the art.The hairpin loop may be any length. The hairpin loop is typically 110 orfewer nucleotides, such as 100 or fewer nucleotides, 90 or fewernucleotides, 80 or fewer nucleotides, 70 or fewer nucleotides, 60 orfewer nucleotides, 50 or fewer nucleotides, 40 or fewer nucleotides, 30or fewer nucleotides, 20 or fewer nucleotides or 10 or fewernucleotides, in length. The hairpin loop is preferably from about I to110, from 2 to 100, from 5 to 80 or from 6 to 50 nucleotides in length.

The hairpin loop may be formed from any of the nucleotides discussedabove. The hairpin loop may be formed from the same nucleotides as thedouble stranded portion. The hairpin loop is preferably formed fromnucleotides which result in the hairpin loop having a lower meltingtemperature (Tm) than the double stranded portion. Melting temperaturecan be measured using routine techniques. If the double stranded portioncomprises RNA, the hairpin is preferably formed from nucleotidescontaining adenosine (A), uridine (U), inosine (I) and zebularine (Z).If the double stranded portion comprises DNA, the hairpin is preferablyformed from nucleotides containing deoxyadenosine (dA), thymidine (dT),deoxyinosine (dI) and deoxyzebularine (dZ). The replacement of guanosine(G)/deoxyguanosine (dG) with inosine (T)/deoxyinosine (dI) and thereplacement of cytidine (C)/deoxycytidine (dC) with zebularine(Z)/deoxyzebularine (dZ) reduces the Tm of the hairpin compared with thedouble stranded portion. I/dI and Z/dZ only form two hydrogen bondswhereas G/dG and C/dC form three hydrogen bonds. In the method of theinvention, the polymerase replaces the overhang strands with new strandswhich complement the strands comprising the hairpin loops. The hairpinloops having a lower Tm may be used to form complementary hairpinshaving a higher Tm, i.e. hairpins formed from nucleotides having ahigher Tm. The polymerase may replace the overhang strands with newstrands which complement the strands comprising the hairpin loops,wherein the new strands comprise hairpin loops having a higher Tm thanthe hairpin loops in the template strands. For instance, a hairpin loopformed from nucleotides containing adenosine (A)/deoxyadenosine (dA),uridine (U)/thymidine (dT), inosine (I)/deoxyinosine (dI) and zebularine(Z)/deoxyzebularine (dZ) may be used to form a complementary RNA or DNAhairpin loop. The difference in Tm between the two hairpins means thatthey are more stable as individual hairpins than hybridised together.This means that the two hairpin loops form their respective loops ratherthan hybridise together. This facilitates the last step of the method inwhich the two strands of the double stranded constructs are separatedand used as templates to form a plurality of modified double strandedpolynucleotides each comprising two complementary strands linked by atleast one hairpin loop. For instance, the separation may be performed atroom temperature.

Each substrate may comprise a selectable binding moiety. If present, theselectable binding moiety is preferably in the hairpin loop. Aselectable binding moiety is a moiety that can be selected on the basisof its binding properties. Hence, a selectable binding moiety ispreferably a moiety that specifically binds to a surface. A selectablebinding moiety specifically binds to a surface if it binds to thesurface to a much greater degree than any other moiety used in theinvention. In preferred embodiments, the moiety binds to a surface towhich no other moiety used in the invention binds.

Suitable selective binding moieties are known in the art. Preferredselective binding moieties include, but are not limited to, biotin, anucleic acid sequence, antibodies, antibody fragments, such as Fab andScSv, antigens, nucleic acid binding proteins, poly histidine tails andGST tags. The most preferred selective binding moieties are biotin and aselectable nucleic acid sequence. Biotin specifically binds to a surfacecoated with avidins. Selectable nucleic acid sequences specifically bind(i.e. hybridize) to a surface coated with homologous sequences.Alternatively, selectable nucleic acid sequences specifically bind to asurface coated with nucleic acid binding proteins.

Each substrate may comprise a leader sequence. The leader sequences istypically on the same strand as the at least one hairpin loop. Theleader sequence is typically at the same end of the substrate as thehairpin loop. The leader sequence is typically located at the end of thestrand comprising the at least one hairpin loop (i.e. the hairpin loopis located between the terminal leader sequence and the rest of thesubstrate). The leader sequence is typically separated from the hairpinloop by a spacer past which any polymerase used in the method of theinvention cannot move. Any of the spacers discussed below, such as oneor more iSpC3 groups (i.e. nucleotides which lack sugar and a base), oneor more spacer 9 (iSp9) groups or one or more spacer 18 (iSp18) groups,may be used. The spacer means that the leader sequence is not used as atemplate in steps (b) and (c) and so remains single stranded at the endof the method. This allows the leader sequence to perform its function.An example of this is shown in FIG. 5 .

The leader sequence preferentially threads into the pore. The leadersequence facilitates the characterisation method of the invention. Theleader sequence is designed to preferentially thread into the pore andthereby facilitate the movement of polynucleotide through the pore. Theleader sequence can also be used to link the polynucleotide to the oneor more anchors as discussed below. The leader sequence typicallycomprises a polymer. The polymer is preferably negatively charged. Thepolymer is preferably a polynucleotide, such as DNA or RNA, a modifiedpolynucleotide (such as abasic DNA), PNA, LNA, polyethylene glycol (PEG)or a polypeptide. The leader preferably comprises a polynucleotide andmore preferably comprises a single stranded polynucleotide. The leadersequence can comprise any of the polynucleotides discussed above. Thesingle stranded leader sequence most preferably comprises a singlestrand of DNA, such as a poly dT section. The leader sequence preferablycomprises one or more spacers.

The leader sequence can be any length, but is typically 10 to 150nucleotides in length, such as from 20 to 150 nucleotides in length. Thelength of the leader typically depends on the transmembrane pore used inthe method.

Fragmentation

The transposase fragments the template double stranded polynucleotide toform a plurality of double stranded fragments. The transposase alsoligates a substrate to one or both ends of the double stranded fragmentsand thereby produces a plurality of fragment/substrate constructs. Thetransposase preferably ligates a substrate to both ends of the doublestranded fragments and thereby produces a plurality offragment/substrate constructs each having a hairpin loop at both ends.An example of this can be seen in FIG. 1 .

Polymerase

The fragment/substrate constructs produced by the transposase arecontacted with a polymerase. Any of the polymerases discussed below maybe used. The polymerase is preferably Klenow or 9° North. The polymeraseis more preferably LongAmp® Taq DNA Polymerase (which is commerciallyavailable from New England Biolabs® Inc.), Phusion® High-Fidelity DNAPolymerase (which is commercially available from New England Biolabs®Inc.) or KAPA HiFi (which is commercially available from KAPABiosystems).

The constructs are contacted with the polymerase under conditions inwhich the polymerase can displace the overhang strands and formcomplement polynucleotides. Such conditions are known in the art. Forinstance, the constructs are typically contacted with the polymerase incommercially available polymerase buffer, such as buffer from NewEngland Biolabs® or KAPA Biosystems. The temperature is preferably from20 to 37° C. for Klenow or from 60 to 75° C. for 9° North, LongAmp® TaqDNA Polymerase, Phusion® High-Fidelity DNA Polymerase or KAPA HiFi.

The polymerase displaces the strands comprising the overhangs from thefragment/substrate constructs. The polymerase replaces the overhangstrands with new strands which complement the strands comprising thehairpin loops. This produces a plurality of double stranded constructseach comprising a double stranded fragment of the templatepolynucleotide. Part of the new strands formed by the polymerase aretypically complementary to the hairpin loop. This means that the hairpinloops typically form part of the double stranded polynucleotide in theconstructs. An example of this can be seen in FIG. 1 .

The polymerase may form new strands comprising any of the nucleotidesdiscussed above and below. The polymerase is provided with a populationof free nucleotides which complement the nucleotides in the strandscomprising the hairpin loops. The polymerase may use the freenucleotides to form the new strands.

Separation/Replication

The two strands of the double stranded constructs are separated and thestrands are used as templates to form a plurality of modified doublestranded polynucleotides each comprising two complementary strandslinked by at least one hairpin loop. An example of this is shown in FIG.1 .

The two strands may be completely separated before they are used astemplates. The two strands may be separated and used as templates at thesame time (i.e. simultaneously). In other words, the two strands do notneed to be completely separated or the two strands may be partiallyseparated before they are used as templates.

The two strands may be separated in any manner. The method preferablycomprises separating the two strands of the double stranded constructsby increasing one or more of pH, temperature and ionic strength. Anincreased temperature is preferred. The method preferably comprisesincreasing the temperature to 95° C. The method preferably comprisesincreasing the temperature to 95° C. and then decreasing the temperatureto 55° C. The method preferably comprises increasing the temperature to95° C., decreasing the temperature to 55° C. and increasing thetemperature to 68° C. The method most preferably comprises incubatingthe double stranded constructs for 2 minutes at 95° C., 30 seconds at55° C. and 30 minutes at 68° C. Increases in pH may be achieved usingformamide or sodium hydroxide (NaOH). Enzymes, such as a helicase or onethat digests the template strand (e.g. USER if that strand had dUinstead of dT), may also be used to separate the strands. Any of thehelicases discussed below may be used.

As discussed in more detail below, the two strands may be separatedusing a polymerase. The polymerase may be any of those discussed aboveor below.

Any method may be used to form new polynucleotides using the separatedstrands as templates. The method preferably comprises contacting thestrands with a polymerase such that the polymerase uses the strands astemplates to form the plurality of modified double strandedpolynucleotides. Any of the polymerases discussed above or below may beused.

Alternatively, the method may comprise (i) contacting the plurality ofstrands with a population of nucleotide oligomers which comprises everypossible combination of nucleotides which are complementary to all ofthe nucleotides in the strands under conditions in which the oligomersare capable of hybridising to the strands and (ii) ligating togetherthose oligomers that hybridise to the strands to form the plurality ofmodified double stranded polynucleotides. Conditions that permit thehybridisation are well-known in the art (for example, Sambrook et al.,2001, Molecular Cloning: a laboratory manual, 3rd edition, Cold SpringHarbour Laboratory Press; and Current Protocols in Molecular Biology,Chapter 2, Ausubel et al., Eds., Greene Publishing andWiley-Interscience, New York (1995)). Hybridisation can be carried outunder low stringency conditions, for example in the presence of abuffered solution of 30 to 35% formamide, 1 M NaCl and 1% SDS (sodiumdodecyl sulfate) at 37° C. followed by a wash in from 1× (0.1650 M Na+)to 2× (0.33 M Na+) SSC (standard sodium citrate) at 50° C. Hybridisationcan be carried out under moderate stringency conditions, for example inthe presence of a buffer solution of 40 to 45% formamide, 1 M NaCl, and1% SDS at 37° C., followed by a wash in from 0.5× (0.0825 M Na+) to 1×(0.1650 M Na+) SSC at 55° C. Hybridisation can be carried out under highstringency conditions, for example in the presence of a bufferedsolution of 50% formamide, 1 M NaCl, 1% SDS at 37° C., followed by awash in 0.1× (0.0165 M Na+) SSC at 60° C. Preferred conditions arepreferably 10 uM oligomers in 10 mM Tris-HCl, 50 mM NaCl, pH 7 and heatto 98° C. before cooling to 18° C. at 2° C. per minute.

The oligomers in the population typically have from 2 to 16 nucleotides.All of the oligomers in the population may have 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15 or 16 nucleotides. The oligomers in thepopulation may have different lengths. All of the oligomers in thepopulation preferably have the same length. The oligomers may compriseany of the nucleotides discussed above. The nucleotides arecomplementary to the nucleotides in the strands to which the oligomershybridise. It is straightforward for a person skilled in the art toidentify nucleotides that are complementary to those nucleotides. Anucleotide is complementary to another nucleotide if it hybridisesthrough base pairing, preferably Watson and Crick base pairing, to thenucleotide. A complementary nucleotide may hybridise to othernucleotides with which it is not complementary, but to a smaller degreethan it hybridises to the nucleotide with which it is complementary. Npreferably comprises the nucleobases adenine (A), uracil (U), guanine(G) or cytosine (C). Alternatively, N preferably comprises thenucleobases A, thymine (T), G or C. A is complementary to T or U andvice versa. G is complementary to C and vice versa.

The population comprises every possible combination of nucleotides whichare complementary to all of the nucleotides in the strands. This meansthat the oligomers will hybridise to most, if not all, of the strandswhatever their sequences. For instance, if N comprises the nucleobasesadenine (A), uracil (U), guanine (G) or cytosine (C), the populationcomprises every possible combination of A, U, G and C. Similarly, if Ncomprises the nucleobases A, thymine (T), G or C, the populationcomprises every possible combination of A, T, G and C.

It is straightforward to design and obtain a population of oligomershaving the requisite combination. For instance, if all of the oligomersin the population comprise or consist of NN and N is A, T, G or C, thenthe population comprises AT, AG, AC, TA, TG, TC, GA, GT, GC, CA, CT andCO. Similarly, if all of the oligomers in the population comprise orconsist of NNN and N is A, T, G or C, then the population comprises ATG,ATC, AGT, AGC, ACT, ACG, TAG, TAC, TGA, TOC, TCA, TCG, GAT, GAC, GTA,GTC, GCA, GCT, CAT, CAG, CTA, CTG, CGA and CGT. Once the genericformula, such as NN or NNN has been designed, populations comprising allof the possible combinations of N are commercially available, forinstance from Intergrated DNA Technologies (IDT), Sigma and Invitrogen.

The oligomers are capable of being ligated together in accordance withthe invention. All of the oligomers in the population preferably have aphosphate group or an adenylate group at the 5′ end.

The hybridised oligomers may be ligated together using any method knownin the art. The oligomers are preferably ligated using a ligase, such asT4 DNA ligase, E. coli DNA ligase, Taq DNA ligase, Tma DNA ligase and 9°N DNA ligase.

The oligomers may also be chemically ligated if reactive groups arepresent on the ends of the oligomers. In such embodiments, steps need tobe taken to prevent the oligomers from ligating to each other insolution. The ligation reaction is typically initiated using the hairpinon a strand as a primer.

In a preferred embodiment, the method preferably comprises contactingthe plurality of double stranded constructs with a polymerase such thatthe polymerase simultaneously separates the two strands of the doublestranded constructs and uses the strands as templates to form theplurality of modified double stranded polynucleotides. Any of thepolymerases discussed above or below may be used. The polymerase mayform new strands comprising any of the nucleotides discussed above andbelow. The polymerase is provided with a population of free nucleotideswhich complement the nucleotides in the template strands. The polymerasemay use the free nucleotides to form the new strands.

Modified Polynucleotides

If the polymerase uses the strands as templates to form a plurality ofmodified double stranded polynucleotides, the method may comprisecontacting the strands with a polymerase and a population of freenucleotides under conditions in which the polymerase uses the strands astemplates to form a plurality of modified double strandedpolynucleotides, wherein the polymerase replaces one or more of thenucleotide species in the strands with a different nucleotide specieswhen forming the modified double stranded polynucleotides. Thepolymerase may be used to simultaneously separate the strands asdiscussed above. This type of modification is described in UKApplication No. 1403096.9. Any of the polymerases discussed above orbelow may be used. The polymerase is preferably Klenow or 9° North.Suitable conditions are discussed above.

Characterisation, such as sequencing, of a polynucleotide using atransmembrane pore typically involves analyzing polymer units made up ofk nucleotides where k is a positive integer (i.e. ‘k-mers’). This isdiscussed in International Application No. PCT/GB2012/052343 (publishedas WO 2013/041878). While it is desirable to have clear separationbetween current measurements for different k-mers, it is common for someof these measurements to overlap. Especially with high numbers ofpolymer units in the k-mer, i.e. high values of k, it can becomedifficult to resolve the measurements produced by different k-mers, tothe detriment of deriving information about the polynucleotide, forexample an estimate of the underlying sequence of the polynucleotide.

By replacing one or more nucleotide species in the strands withdifferent nucleotide species in the new strands (i.e. the strandsproduced using the polymerase) of the modified double strandedpolynucleotides, the new strands contain k-mers which differ from thosein the template strands. The different k-mers in the new strands arecapable of producing different current measurements from the k-mers inthe template strands and so the new strands provide differentinformation from the template strands. The additional information fromthe new strands can make it easier to characterise the modified doublestranded polynucleotides and hence the template polynucleotide. In someinstances, the modified double stranded polynucleotides themselves maybe easier to characterise. For instance, the modified double strandedpolynucleotides may be designed to include k-mers with an increasedseparation or a clear separation between their current measurements ork-mers which have a decreased noise.

The polymerase preferably replaces two or more of the nucleotide speciesin the template strands with different nucleotide species when formingthe modified double stranded polynucleotides. The polymerase may replaceeach of the two or more nucleotide species in the template strands witha distinct nucleotide species. The polymerase may replace each of thetwo or more nucleotide species in the template strands with the samenucleotide species.

If the template strands are DNA, the different nucleotide speciestypically comprises a nucleobase which differs from adenine, guanine,thymine, cytosine or methylcytosine and/or comprises a nucleoside whichdiffers from deoxyadenosine, deoxyguanosine, thymidine, deoxycytidine ordeoxymethylcytidine. If the template strands are RNA, the differentnucleotide species in the modified polynucleotide typically comprises anucleobase which differs from adenine, guanine, uracil, cytosine ormethylcytosine and/or comprises a nucleoside which differs fromadenosine, guanosine, uridine, cytidine or methylcytidine.

The different nucleotide species may be a universal nucleotide. Auniversal nucleotide is one which will hybridise or bind to some degreeto all of the nucleotides in the template strands. A universalnucleotide is preferably one which will hybridise or bind to some degreeto nucleotides comprising the nucleosides adenosine (A), thymine (T),uracil (U), guanine (G) and cytosine (C). The universal nucleotide mayhybridise or bind more strongly to some nucleotides than to others. Forinstance, a universal nucleotide (I) comprising the nucleoside,2′-deoxyinosine, will show a preferential order of pairing ofI-C>I-A>I-G approximately =I-T. The polymerase will replace a nucleotidespecies with a universal nucleotide if the universal nucleotide takesthe place of the nucleotide species in the population. For instance, thepolymerase will replace dGMP with a universal nucleotide, if it iscontacted with a population of free dAMP, dTMP, dCMP and the universalnucleotide.

The universal nucleotide preferably comprises one of the followingnucleobases: hypoxanthine, 4-nitroindole, 5-nitroindole, 6-nitroindole,formylindole, 3-nitropyrrole, nitroimidazole, 4-nitropyrazole,4-nitrobenzimidazole, 5-nitroindazole, 4-aminobenzimidazole or phenyl(C6-aromatic ring). The universal nucleotide more preferably comprisesone of the following nucleosides: 2′-deoxyinosine, inosine,7-deaza-2′-deoxyinosine, 7-deaza-inosine, 2-aza-deoxyinosine,2-aza-inosine, 2-0′-methylinosine, 4-nitroindole 2′-deoxyribonucleoside,4-nitroindole ribonucleoside, 5-nitroindole 2′-deoxyribonucleoside,5-nitroindole ribonucleoside, 6-nitroindole 2′-deoxyribonucleoside,6-nitroindole ribonucleoside, 3-nitropyrrole 2′-deoxyribonucleoside,3-nitropyrrole ribonucleoside, an acyclic sugar analogue ofhypoxanthine, nitroimidazole 2′-deoxyribonucleoside, nitroimidazoleribonucleoside, 4-nitropyrazole 2′-deoxyribonucleoside, 4-nitropyrazoleribonucleoside, 4-nitrobenzimidazole 2′-deoxyribonucleoside,4-nitrobenzimidazole ribonucleoside, 5-nitroindazole2′-deoxyribonucleoside, 5-nitroindazole ribonucleoside,4-aminobenzimidazole 2′-deoxyribonucleoside, 4-aminobenzimidazoleribonucleoside, phenyl C-ribonucleoside, phenyl C-2′-deoxyribosylnucleoside, 2′-deoxynebularine, 2′-deoxyisoguanosine, K-2′-deoxyribose,P-2′-deoxyribose and pyrrolidine. The universal nucleotide morepreferably comprises 2′-deoxyinosine. The universal nucleotide is morepreferably IMP or dIMP. The universal nucleotide is most preferably dPMP(2′-Deoxy-P-nucleoside monophosphate) or dKMP(N6-methoxy-2,6-diaminopurine monophosphate).

The different nucleotide species preferably comprises a chemical atom orgroup absent from the nucleotide species it is replacing. The chemicalgroup is preferably a propynyl group, a thio group, an oxo group, amethyl group, a hydroxymethyl group, a formyl group, a carboxy group, acarbonyl group, a benzyl group, a propargyl group or a propargylaminegroup. The chemical group or atom may be or may comprise a fluorescentmolecule, biotin, digoxigenin, DNP (dinitrophenol), a photo-labilegroup, an alkyne, DBCO, azide, free amino group, a redox dye, a mercuryatom or a selenium atom.

Commercially available nucleosides comprising chemical groups which areabsent from naturally-occurring nucleosides include, but are not limitedto, 6-Thio-2′-deoxyguanosine, 7-Deaza-2′-deoxyadenosine,7-Deaza-2′-deoxyguanosine, 7-Deaza-2′-deoxyxanthosine,7-Deaza-8-aza-2′-deoxyadenosine, 8-5′(5'S)-Cyclo-2′-deoxyadenosine,8-Amino-2′-deoxyadenosine, 8-Amino-2′-deoxyguanosine,8-Deuterated-2′-deoxyguanosine, 8-Oxo-2′-deoxyadenosine,8-Oxo-2′-deoxyguanosine, Etheno-2′-deoxyadenosine,N6-Methyl-2′-deoxyadenosine, O6-Methyl-2′-deoxyguanosine, O6-Phenyl-2′deoxyinosine, 2′-Deoxypseudouridine, 2-Thiothymidine,4-Thio-2′-deoxyuridine, 4-Thiothymidine, 5′ Aminothymidine,5-(1-Pyrenylethynyl)-2′-deoxyuridine, 5-(C2-EDTA)-2′-deoxyuridine,5-(Carboxy)vinyl-2′-deoxyuridine, 5,6-Dihydro-2′-deoxyuridine,5,6-Dihydrothymidine, 5-Bromo-2′-deoxycytidine, 5-Bromo-2′-deoxyuridine,5-Carboxy-2′-deoxycytidine, 5-Fluoro-2′-deoxyuridine,5-Formyl-2′-deoxycytidine, 5-Hydroxy-2′-deoxycytidine,5-Hydroxy-2′-deoxyuridine, 5-Hydroxymethyl-2′-deoxycytidine,5-Hydroxymethyl-2′-deoxyuridine, 5-Iodo-2′-deoxycytidine,5-Iodo-2′-deoxyuridine, 5-Methyl-2′-deoxycytidine,5-Methyl-2′-deoxyisocytidine, 5-Propynyl-2′-deoxycytidine,5-Propynyl-2′-deoxyuridine, 6-O-(TMP)-5-F-2′-deoxyuridine,C4-(1,2,4-Triazol-1-yl)-2′-deoxyuridine, C8-Alkyne-thymidine,dT-Ferrocene, N4-Ethyl-2′-deoxycytidine, O4-Methylthymidine,Pyrrolo-2′-deoxycytidine, Thymidine Glycol, 4-Thiouridine,5-Methylcytidine, 5-Methyluridine, Pyrrolocytidine,3-Deaza-5-Aza-2′-O-methylcytidine, 5-Fluoro-2′-O-Methyluridine,5-Fluoro-4-O-TMP-2′-O-Methyluridine, 5-Methyl-2′-O-Methylcytidine,5-Methyl-2′-O-Methylthymidine, 2′,3′-Dideoxyadenosine,2′,3′-Dideoxycytidine, 2′,3′-Dideoxyguanosine, 2′,3′-Dideoxythymidine,3′-Deoxyadenosine, 3′-Deoxycytidine, 3′-Deoxyguanosine,3′-Deoxythymidine and 5′-O-Methylthymidine. The different nucleotidespecies may comprise any of these nucleosides.

Alternatively, the different nucleotide species preferably lacks achemical group or atom present in the nucleotide species it isreplacing.

The different nucleotide species preferably has an alteredelectronegativity compared with the one or more nucleotides beingreplaced. The different nucleotide species having an alteredelectronegativity preferably comprises a halogen atom. The halogen atommay be attached to any position on the different nucleotide species,such as the nucleobase and/or the sugar. The halogen atom is preferablyfluorine (F), chlorine (Cl), bromine (Br) or iodine (I). The halogenatom is most preferably F or I.

Commercially available nucleosides comprising a halogen include, but arenot limited to, 8-Bromo-2′-deoxyadenosine, 8-Bromo-2′-deoxyguanosine,5-Bromouridine, 5-Iodouridine, 5-Bromouridine, 5-Iodouridine,5′-Iodothymidine and 5-Bromo-2′-O-methyluridine. The differentnucleotide species may comprise any of these nucleosides.

The method preferably further comprises selectively removing thenucleobases from the one or more different nucleotides species in themodified double stranded polynucleotides. This results in abasicnucleotides in the modified double stranded polynucleotides. An abasicnucleotide is a nucleotide that lacks a nucleobase. The abasicnucleotide typically contains a sugar and at least one phosphate group.The sugar is typically a pentose sugar, such as ribose and deoxyribose.The abasic nucleotide is typically an abasic ribonucleotide or an abasicdeoxyribonucleotide. The abasic nucleotide typically contains amonophosphate, diphosphate or triphosphate. Phosphates may be attachedon the 5′ or 3′ side of an abasic nucleotide.

The nucleobases may be selectively removed using any method known in theart. For instance, certain DNA repair proteins, such as humanalkyladenine DNA glycosylase (hAAG), are capable of selectively removing3-methyl adenine, 7-methyl guanine, 1, N6-ethenoadenine and hypoxanthinefrom nucleotides. Also, dUMP can be selectively removed using uracil DNAglycosylase.

Additional Polymerase Step

In another preferred embodiment, the amount of information in themodified double stranded polynucleotides is doubled to facilitatecharacterisation of the template polynucleotide. An example of this isshown in FIG. 8 . The method preferably comprises (d) separating the twostrands of the modified double stranded polynucleotides and using thestrands as templates to form a plurality of adapted double strandedpolynucleotides each comprising two complementary strands linked by atleast one hairpin loop, wherein each complementary strand comprises twocomplementary sequences. One of the two complementary sequences in eachcomplementary strand is derived from the template double strandedpolynucleotide. Step (d) typically comprises before separation attachinga hairpin loop to the modified double stranded polynucleotides at theother end of the modified double stranded polynucleotides from the atleast one hairpin loop which links the complementary strands. Thishairpin loop preferably does not link the strands of the modified doublestranded polynucleotides. The hairpin may form a nucleation point forthe polymerase. When the separated strands of the modified doublestranded polynucleotides are used as templates, the attached hairpinloop is also used as a template and links the two complementary strandsof the adapted double stranded polynucleotides, i.e links the templatestrands from the modified double stranded polynucleotides and the newstrands formed from the templates.

Step (d) may be carried out in any of the ways discussed above. Forinstance, step (d) may comprises separating the two strands of themodified double stranded polynucleotides by increasing one or more ofpH, temperature and ionic strength. Step (d) may comprise contacting theseparated strands with a polymerase such that the polymerase uses thestrands as templates to form the plurality of adapted double strandedpolynucleotides. Step (d) may comprise (i) contacting the plurality ofseparated strands with a population of nucleotide oligomers whichcomprises every possible combination of nucleotides which arecomplementary to all of the nucleotides in the strands under conditionsin which the oligomers are capable of hybridising to the strands and(ii) ligating together those oligomers that hybridise to the strands toform the plurality of adapted double stranded polynucleotides. Step (d)may comprise contacting the plurality of modified double strandedpolynucleotides with a polymerase such that the polymerasesimultaneously separates the two strands of the modified double strandedpolynucleotides and uses the strands as templates to the plurality ofadapted double stranded polynucleotides. Any of the embodimentsdiscussed above may apply to step (d). For instance, step (d) maycomprise replacing one or more nucleotide species in the templatestrands with different nucleotide species in the new strands.

Y Adaptors

If each substrate does not comprise a leader sequence, the methodpreferably further comprises attaching Y adaptors to the plurality ofmodified double stranded polynucleotides at the opposite ends from thehairpin loops. The Y adaptors are typically polynucleotide adaptors.They may be formed from any of the polynucleotides discussed above. TheY adaptors typically comprise (a) a double stranded region and (b) asingle stranded region or a region that is not complementary at theother end. The Y adaptors may be described as having an overhang if itcomprises a single stranded region. The presence of a non-complementaryregion in the Y adaptors gives them their Y shape since the two strandstypically do not hybridise to each other unlike the double strandedportion. The Y adaptors may comprise one or more anchors as discussed inmore detail below.

The Y adaptors may be ligated to the modified double strandedpolynucleotides. Ligation may be carried out using any method known inthe art. For instance, the Y adaptors may be ligated using a ligase,such as T4 DNA ligase, E. coli DNA ligase. Taq DNA ligase, Tma DNAligase and 9° N DNA ligase.

Products of the Invention

The invention also provides a population of double stranded MuAsubstrates for modifying a template polynucleotide, wherein eachsubstrate comprises at least one overhang of universal nucleotides. Theinvention also provides a population of double stranded MuA substratesfor modifying a template polynucleotide, wherein each substratecomprises (i) at least one overhang and (ii) at least one hairpin loopin the opposite strand from the strand comprising the at least oneoverhang. The substrates may be any of those described above. Thesubstrates preferably comprise a double stranded portion as definedabove. The double stranded portion preferably comprises SEQ ID NOs: 26and 27 as discussed above. The double stranded portion more preferablycomprises SEQ ID NOs: 26 and 28 as discussed above. Preferredpopulations of the invention are those in which each substrate comprisesan overhang at one end and a hairpin loop at the other end.

The invention also provides a plurality of polynucleotides modifiedusing the method of the invention. The plurality of polynucleotides maybe in any of the forms discussed above. The modified double strandedpolynucleotides comprise two complementary strands comprising a doublestranded fragment of the template polynucleotide linked by a hairpinloop.

The population or plurality may be isolated, substantially isolated,purified or substantially purified. A population or plurality isisolated or purified if it is completely free of any other components,such as the template polynucleotide, lipids or pores. A population orplurality is substantially isolated if it is mixed with carriers ordiluents which will not interfere with its intended use. For instance, apopulation or plurality is substantially isolated or substantiallypurified if it is present in a form that comprises less than 10%, lessthan 5%, less than 2% or less than 1% of other components, such aslipids or pores.

Characterisation Methods

The invention also provides methods of characterising at least onepolynucleotide modified using a method of the invention. The modifiedpolynucleotide is contacted with a transmembrane pore such that at leastone strand of the polynucleotide moves through the pore. One or moremeasurements are taken as the at least one strand moves with respect tothe pore. The measurements are indicative of one or more characteristicsof the at least one strand and this allows characterisation of themodified polynucleotide.

The invention also provides methods of characterising a templatepolynucleotide. The template polynucleotide is modified using theinvention to produce a plurality of modified polynucleotides. Eachmodified polynucleotide is contacted with a transmembrane pore such thatat least one strand of each polynucleotide moves through the pore. Oneor more measurements are taken as each polynucleotide moves with respectto the pore. The measurements are indicative of one or morecharacteristics of each polynucleotide this allows the templatepolynucleotide to be characterised.

In a preferred embodiment, both strands of the/each modifiedpolynucleotide move through the pore. If both strands move through thepore, the two strands are typically separated. The two strands may beseparated using any method known in the art. For instance, they may beseparated by a polynucleotide binding protein or using conditions whichfavour dehybridisation (examples of conditions which favourdehybridisation include, but are not limited to, high temperature, highpH and the addition of agents that can disrupt hydrogen bonding or basepairing, such as formamide and urea).

Transmembrane Pore

A transmembrane pore is a structure that crosses the membrane to somedegree. It permits hydrated ions driven by an applied potential to flowacross or within the membrane. The transmembrane pore typically crossesthe entire membrane so that hydrated ions may flow from one side of themembrane to the other side of the membrane. However, the transmembranepore does not have to cross the membrane. It may be closed at one end.For instance, the pore may be a well, gap, channel, trench or slit inthe membrane along which or into which hydrated ions may flow.

The one or more selectively amplified probes or one or moreamplification products are preferably characterised by (i) contactingthe probes or amplification products with a transmembrane pore such thatthe probes or amplification products move through the pore and (ii)taking one or more measurements as the probes or amplification productsmove with respect to the pore wherein the measurements are indicative ofone or more characteristics of the probes or amplification products andthereby characterising the probes or amplification products.

Any transmembrane pore may be used in the invention. The pore may bebiological or artificial. Suitable pores include, but are not limitedto, protein pores, polynucleotide pores and solid state pores. The poremay be a DNA origami pore (Langecker et al., Science, 2012; 338:932-936).

The transmembrane pore is preferably a transmembrane protein pore.

Transmembrane protein pores for use in accordance with the invention canbe derived from β-barrel pores or a-helix bundle pores. β-barrel porescomprise a barrel or channel that is formed from β-strands. Suitableβ-barrel pores include, but are not limited to, β pore forming toxins,such as a-hemolysin, anthrax toxin and leukocidins, and outer membraneproteins/porins of bacteria, such as Mycobacterium smegmatis porin(Msp), for example MspA, MspB, MspC or MspD, outer membrane porin F(OmpF), outer membrane porin G (OmpG), outer membrane phospholipase Aand Neisseria autotransporter lipoprotein (NalP), and other pores, suchas lysenin. α-helix bundle pores comprise a barrel or channel that isformed from α-helices. Suitable α-helix bundle pores include, but arenot limited to, inner membrane proteins and a outer membrane proteins,such as WZA and ClyA toxin. The transmembrane pore may be derived fromlysenin. Suitable pores derived from lysenin are disclosed inInternational Application No. PCT/GB2013/050667 (published as WO2013/153359). The transmembrane pore may be derived from Msp, such asMspA, or from a-hemolysin (α-HL). The wild type α-HL pore is formed ofseven identical monomers or subunits (i.e. it is heptameric). Thesequence of one monomer or subunit of α-hemolysin-NN is shown in SEQ IDNO: 4.

The transmembrane protein pore is preferably derived from Msp,preferably from MspA. Such a pore will be oligomeric and typicallycomprises 7, 8, 9 or 10 monomers derived from Msp. The pore may be ahomo-oligomeric pore derived from Msp comprising identical monomers.Alternatively, the pore may be a hetero-oligomeric pore derived from Mspcomprising at least one monomer that differs from the others. Preferablythe pore is derived from MspA or a homolog or paralog thereof.

A monomer derived from Msp typically comprises the sequence shown in SEQID NO: 2 or a variant thereof. SEQ ID NO: 2 is the MS-(B1)8 mutant ofthe MspA monomer. It includes the following mutations: D90N, D91N, D93N,Dl 18R, D134R and E139K. A variant of SEQ ID NO: 2 is a polypeptide thathas an amino acid sequence which varies from that of SEQ ID NO: 2 andwhich retains its ability to form a pore. Suitable variants aredisclosed in International Application No. PCT/GB2012/050301 (publishedas WO 2012/107778) and UK Application No. 1407809.1 (ONT IP 057). Apreferred variant of SEQ ID NO: 2 comprises N93D. The ability of avariant to form a pore can be assayed using any method known in the art.For instance, the variant may be inserted into an amphiphilic layeralong with other appropriate subunits and its ability to oligomerise toform a pore may be determined. Methods are known in the art forinserting subunits into membranes, such as amphiphilic layers. Forexample, subunits may be suspended in a purified form in a solutioncontaining a triblock copolymer membrane such that it diffuses to themembrane and is inserted by binding to the membrane and assembling intoa functional state. Alternatively, subunits may be directly insertedinto the membrane using the “pick and place” method described in M. A.Holden, H. Bayley. J. Am. Chem. Soc. 2005, 127, 6502-6503 andInternational Application No. PCT/GB2006/001057 (published as WO2006/100484).

Over the entire length of the amino acid sequence of SEQ ID NO: 2, avariant will preferably be at least 50% homologous to that sequencebased on amino acid identity. More preferably, the variant may be atleast 55%, at least 60%, at least 65%, at least 70%, at least 75%, atleast 80%, at least 85%, at least 90% and more preferably at least 95%,97% or 99% homologous based on amino acid identity to the amino acidsequence of SEQ ID NO: 2 over the entire sequence. There may be at least80%, for example at least 85%, 90% or 95%, amino acid identity over astretch of 100 or more, for example 125, 150, 175 or 200 or more,contiguous amino acids (“hard homology”).

Any of the proteins described herein, such as the transmembrane proteinpores, may be made synthetically or by recombinant means. For example,the pore may be synthesised by in vitro translation and transcription(IVTT). The amino acid sequence of the pore may be modified to includenon-naturally occurring amino acids or to increase the stability of theprotein. When a protein is produced by synthetic means, such amino acidsmay be introduced during production. The pore may also be alteredfollowing either synthetic or recombinant production.

Characterisation

The method may involve measuring two, three, four or five or morecharacteristics of the modified or template polynucleotide. The one ormore characteristics are preferably selected from (i) the length of thepolynucleotide, (ii) the identity of the polynucleotide, (iii) thesequence of the polynucleotide, (iv) the secondary structure of thepolynucleotide and (v) whether or not the polynucleotide is modified.Any combination of (i) to (v) may be measured in accordance with theinvention, such as {i}, {ii}, {iii}, {iv}, {v}, {i,ii}, {i,iii}, {i,iv},{i,v}, {ii,iii}, {ii,iv}, {ii,v}, {iii,iv}, {iii,v}, {iv,v}, {i,ii,iii},{i,ii,iv}, {i,ii,v}, {i,iii,iv}, {i,iii,v}, {i,iv,v}, {ii,iii,iv},{ii,iii,v}, {ii,iv,v}, {iii,iv,v}, {i,ii,iii,iv}, {i,ii,iii,v},{i,ii,iv,v}, {i,iii,iv,v}, {ii,iii,iv,v} or {i,ii,iii,iv,v}. Differentcombinations of (i) to (v) may be measured for the first polynucleotidecompared with the second polynucleotide, including any of thosecombinations listed above.

For (i), the length of the polynucleotide may be measured for example bydetermining the number of interactions between the polynucleotide andthe pore or the duration of interaction between the polynucleotide andthe pore.

For (ii), the identity of the polynucleotide may be measured in a numberof ways. The identity of the polynucleotide may be measured inconjunction with measurement of the sequence of the polynucleotide orwithout measurement of the sequence of the polynucleotide. The former isstraightforward; the polynucleotide is sequenced and thereby identified.The latter may be done in several ways. For instance, the presence of aparticular motif in the polynucleotide may be measured (withoutmeasuring the remaining sequence of the polynucleotide). Alternatively,the measurement of a particular electrical and/or optical signal in themethod may identify the polynucleotide as coming from a particularsource.

For (iii), the sequence of the polynucleotide can be determined asdescribed previously. Suitable sequencing methods, particularly thoseusing electrical measurements, are described in Stoddart D et al., ProcNatl Acad Sci, 12; 106(19):7702-7, Lieberman K R et al, J Am Chem Soc.2010; 132(50):17961-72, and International Application WO 2000/28312.

For (iv), the secondary structure may be measured in a variety of ways.For instance, if the method involves an electrical measurement, thesecondary structure may be measured using a change in dwell time or achange in current flowing through the pore. This allows regions ofsingle-stranded and double-stranded polynucleotide to be distinguished.

For (v), the presence or absence of any modification may be measured.The method preferably comprises determining whether or not thepolynucleotide is modified by methylation, by oxidation, by damage, withone or more proteins or with one or more labels, tags or spacers.Specific modifications will result in specific interactions with thepore which can be measured using the methods described below. Forinstance, methylcytosine may be distinguished from cytosine on the basisof the current flowing through the pore during its interaction with eachnucleotide.

The polynucleotide is contacted with a transmembrane pore. The pore istypically present in a membrane. Suitable membranes are discussed below.The method may be carried out using any apparatus that is suitable forinvestigating a membrane/pore system in which a pore is present in amembrane. The method may be carried out using any apparatus that issuitable for transmembrane pore sensing. For example, the apparatuscomprises a chamber comprising an aqueous solution and a barrier thatseparates the chamber into two sections. The barrier typically has anaperture in which the membrane containing the pore is formed.Alternatively the barrier forms the membrane in which the pore ispresent.

The method may be carried out using the apparatus described inInternational Application No. PCT/GB08/000562 (WO 2008/102120).

A variety of different types of measurements may be made. This includeswithout limitation: electrical measurements and optical measurements.Possible electrical measurements include: current measurements,impedance measurements, tunnelling measurements (Ivanov A P et al., NanoLett. 2011 Jan. 12; 11(1):279-85), and FET measurements (InternationalApplication WO 2005/124888). Optical measurements may be combined withelectrical measurements (Soni G V et al., Rev Sci Instrum, 2010 January;81(1):014301). The measurement may be a transmembrane currentmeasurement such as measurement of ionic current flowing through thepore.

Electrical measurements may be made using standard single channelrecording equipment as describe in Stoddart D et al., Proc Natl AcadSci, 12; 106(19):7702-7, Lieberman K R et al, J Am Chem Soc. 2010;132(50):17961-72, and International Application WO 2000/28312.Alternatively, electrical measurements may be made using a multi-channelsystem, for example as described in International Application WO2009/077734 and International Application WO 2011/067559.

The method is preferably carried out with a potential applied across themembrane. The applied potential may be a voltage potential.Alternatively, the applied potential may be a chemical potential. Anexample of this is using a salt gradient across a membrane, such as anamphiphilic layer. A salt gradient is disclosed in Holden et al., J AmChem Soc. 2007 Jul. 11: 129(27):8650-5. In some instances, the currentpassing through the pore as a polynucleotide moves with respect to thepore is used to estimate or determine the sequence of thepolynucleotide. This is strand sequencing.

The method may involve measuring the current passing through the pore asthe polynucleotide moves with respect to the pore. Therefore theapparatus used in the method may also comprise an electrical circuitcapable of applying a potential and measuring an electrical signalacross the membrane and pore. The methods may be carried out using apatch clamp or a voltage clamp. The methods preferably involve the useof a voltage clamp.

The method of the invention may involve the measuring of a currentpassing through the pore as the polynucleotide moves with respect to thepore. Suitable conditions for measuring ionic currents throughtransmembrane protein pores are known in the art and disclosed in theExample. The method is typically carried out with a voltage appliedacross the membrane and pore. The voltage used is typically from +5 V to−5 V, such as from +4 V to −4 V, +3 V to −3 V or +2 V to −2 V. Thevoltage used is typically from −600 mV to +600 mV or −400 mV to +400 mV.

The voltage used is preferably in a range having a lower limit selectedfrom −400 mV, −300 mV, −200 mV, −150 mV, −100 mV, −50 mV, −20 mV and 0mV and an upper limit independently selected from +10 mV, +20 mV, +50mV, +100 mV, +150 mV, +200 mV, +300 mV and +400 mV. The voltage used ismore preferably in the range 100 mV to 240 mV and most preferably in therange of 120 mV to 220 mV. It is possible to increase discriminationbetween different nucleotides by a pore by using an increased appliedpotential.

The method is typically carried out in the presence of any chargecarriers, such as metal salts, for example alkali metal salt, halidesalts, for example chloride salts, such as alkali metal chloride salt.Charge carriers may include ionic liquids or organic salts, for exampletetramethyl ammonium chloride, trimethylphenyl ammonium chloride,phenyltrimethyl ammonium chloride, or 1-ethyl-3-methyl imidazoliumchloride. In the exemplary apparatus discussed above, the salt ispresent in the aqueous solution in the chamber. Potassium chloride(KCl), sodium chloride (NaCl), caesium chloride (CsCl) or a mixture ofpotassium ferrocyanide and potassium ferricyanide is typically used.KCl, NaCl and a mixture of potassium ferrocyanide and potassiumferricyanide are preferred. The charge carriers may be asymmetric acrossthe membrane. For instance, the type and/or concentration of the chargecarriers may be different on each side of the membrane.

The salt concentration may be at saturation. The salt concentration maybe 3 M or lower and is typically from 0.1 to 2.5 M, from 0.3 to 1.9 M,from 0.5 to 1.8 M, from 0.7 to 1.7 M, from 0.9 to 1.6 M or from 1 M to1.4 M. The salt concentration is preferably from 150 mM to 1 M. Themethod is preferably carried out using a salt concentration of at least0.3 M, such as at least 0.4 M, at least 0.5 M, at least 0.6 M, at least0.8 M, at least 1.0 M, at least 1.5 M, at least 2.0 M, at least 2.5 M orat least 3.0 M. High salt concentrations provide a high signal to noiseratio and allow for currents indicative of the presence of a nucleotideto be identified against the background of normal current fluctuations.

The method is typically carried out in the presence of a buffer. In theexemplary apparatus discussed above, the buffer is present in theaqueous solution in the chamber. Any buffer may be used in the method ofthe invention. Typically, the buffer is phosphate buffer. Other suitablebuffers are HEPES and Tris-HCl buffer. The methods are typically carriedout at a pH of from 4.0 to 12.0, from 4.5 to 10.0, from 5.0 to 9.0, from5.5 to 8.8, from 6.0 to 8.7 or from 7.0 to 8.8 or 7.5 to 8.5. The pHused is preferably about 7.5.

The method may be carried out at from 0° C. to 100° C., from 15° C. to95° C., from 16° C. to 90° C., from 17° C. to 85° C. from 18° C. to 80°C., 19° C. to 70° C., or from 20° C. to 60° C. The methods are typicallycarried out at room temperature. The methods are optionally carried outat a temperature that supports enzyme function, such as about 37° C.

Polynucleotide Binding Protein

The method preferably comprises contacting the/each polynucleotide witha polynucleotide binding protein such that the protein controls themovement of at least one strand of the/each polynucleotide through thepore.

More preferably, the method comprises (a) contacting the/eachpolynucleotide with the pore and a polynucleotide binding protein suchthat the protein controls the movement of at least one strand ofthe/each polynucleotide through the pore and (b) taking one or moremeasurements as the/each polynucleotide moves with respect to the pore,wherein the measurements are indicative of one or more characteristicsof the/each polynucleotide, and thereby characterising the modified ortemplate polynucleotide.

The polynucleotide binding protein may be any protein that is capable ofbinding to the polynucleotide and controlling its movement through thepore. It is straightforward in the art to determine whether or not aprotein binds to a polynucleotide. The protein typically interacts withand modifies at least one property of the polynucleotide. The proteinmay modify the polynucleotide by cleaving it to form individualnucleotides or shorter chains of nucleotides, such as di- ortrinucleotides. The protein may modify the polynucleotide by orientingit or moving it to a specific position, i.e. controlling its movement.

The polynucleotide binding protein is preferably derived from apolynucleotide handling enzyme. A polynucleotide handling enzyme is apolypeptide that is capable of interacting with and modifying at leastone property of a polynucleotide. The enzyme may modify thepolynucleotide by cleaving it to form individual nucleotides or shorterchains of nucleotides, such as di- or trinucleotides. The enzyme maymodify the polynucleotide by orienting it or moving it to a specificposition. The polynucleotide handling enzyme does not need to displayenzymatic activity as long as it is capable of binding thepolynucleotide and controlling its movement through the pore. Forinstance, the enzyme may be modified to remove its enzymatic activity ormay be used under conditions which prevent it from acting as an enzyme.Such conditions are discussed in more detail below.

The polynucleotide handling enzyme is preferably derived from anucleolytic enzyme. The polynucleotide handling enzyme used in theconstruct of the enzyme is more preferably derived from a member of anyof the Enzyme Classification (EC) groups 3.1.11, 3.1.13, 3.1.14. 3.1.15,3.1.16, 3.1.21, 3.1.22, 3.1.25, 3.1.26, 3.1.27, 3.1.30 and 3.1.31. Theenzyme may be any of those disclosed in International Application No.PCT/GB10/000133 (published as WO 2010/086603).

Preferred enzymes are polymerases, exonucleases, helicases andtopoisomerases, such as gyrases. Suitable enzymes include, but are notlimited to, exonuclease I from E. coli (SEQ ID NO: 11), exonuclease IIIenzyme from E. coli (SEQ ID NO: 13), RecJ from T. thermophilus (SEQ IDNO: 15) and bacteriophage lambda exonuclease (SEQ ID NO: 17). TatDexonuclease and variants thereof. Three subunits comprising the sequenceshown in SEQ ID NO: 15 or a variant thereof interact to form a trimerexonuclease. The polymerase may be PyroPhage® 3173 DNA Polymerase (whichis commercially available from Lucigen® Corporation), SD Polymerase(commercially available from Bioron®) or variants thereof. The enzyme ispreferably Phi29 DNA polymerase (SEQ ID NO: 9) or a variant thereof. Thetopoisomerase is preferably a member of any of the Moiety Classification(EC) groups 5.99.1.2 and 5.99.1.3.

The enzyme is most preferably derived from a helicase, such as Hel308Mbu (SEQ ID NO: 18). Hel308 Csy (SEQ ID NO: 19), Hel308 Tga (SEQ ID NO:20), Hel308 Mhu (SEQ ID NO: 21), TraI Eco (SEQ ID NO: 22), XPD Mbu (SEQID NO: 23) or a variant thereof. Any helicase may be used in theinvention. The helicase may be or be derived from a Hel308 helicase, aRecD helicase, such as TraI helicase or a TrwC helicase, a XPD helicaseor a Dda helicase. The helicase may be any of the helicases, modifiedhelicases or helicase constructs disclosed in International ApplicationNos. PCI/GB2012/052579 (published as WO 2013/057495); PCT/GB2012/053274(published as WO 2013/098562); PCT/GB2012/053273 (published asWO2013098561); PCT/GB2013/051925 (published as WO 2014/013260);PCT/GB2013/051924 (published as WO 2014/013259); PCT/GB2013/051928(published as WO 2014/013262) and PCT/GB2014/052736.

The helicase preferably comprises the sequence shown in SEQ ID NO: 25(Trwc Cba) or as variant thereof, the sequence shown in SEQ ID NO: 18(Hel308 Mbu) or a variant thereof or the sequence shown in SEQ ID NO: 24(Dda) or a variant thereof. Variants may differ from the nativesequences in any of the ways discussed below for transmembrane pores. Apreferred variant of SEQ ID NO: 24 comprises (a) E94C and A360C or (b)E94C, A360C, C109A and C136A and then optionally (ΔM1)G1G2 (i.e.deletion of M1 and then addition G1 and G2).

In strand sequencing, the polynucleotide is translocated through thepore either with or against an applied potential. Exonucleases that actprogressively or processively on double stranded polynucleotides can beused on the cis side of the pore to feed the remaining single strandthrough under an applied potential or the trans side under a reversepotential. Likewise, a helicase that unwinds the double stranded DNA canalso be used in a similar manner. A polymerase may also be used. Thereare also possibilities for sequencing applications that require strandtranslocation against an applied potential, but the DNA must be first“caught” by the enzyme under a reverse or no potential. With thepotential then switched back following binding the strand will pass cisto trans through the pore and be held in an extended conformation by thecurrent flow. The single strand DNA exonucleases or single strand DNAdependent polymerases can act as molecular motors to pull the recentlytranslocated single strand back through the pore in a controlledstepwise manner, trans to cis, against the applied potential.

Any helicase may be used in the method. Helicases may work in two modeswith respect to the pore. First, the method is preferably carried outusing a helicase such that it moves the polynucleotide through the porewith the field resulting from the applied voltage. In this mode the 5′end of the polynucleotide is first captured in the pore, and thehelicase moves the polynucleotide into the pore such that it is passedthrough the pore with the field until it finally translocates through tothe trans side of the membrane. Alternatively, the method is preferablycarried out such that a helicase moves the polynucleotide through thepore against the field resulting from the applied voltage. In this modethe 3′ end of the polynucleotide is first captured in the pore, and thehelicase moves the polynucleotide through the pore such that it ispulled out of the pore against the applied field until finally ejectedback to the cis side of the membrane.

Helicase(s) and Molecular Brake(s)

In a preferred embodiment, the method comprises:

-   -   (i) providing the/each polynucleotide with one or more helicases        and one or more molecular brakes attached to the/each        polynucleotide;    -   (b) contacting the/each polynucleotide with a transmembrane pore        and applying a potential across the pore such that the one or        more helicases and the one or more molecular brakes are brought        together and both control the movement of at least one strand of        the/each polynucleotide through the pore;    -   (c) taking one or more measurements as the/each polynucleotide        moves with respect to the pore wherein the measurements are        indicative of one or more characteristics of the polynucleotide        and thereby characterising the modified or template        polynucleotide.

This type of method is discussed in detail in the InternationalApplication PCT/GB2014/052737.

Spacers

The one or more helicases may be stalled at one or more spacers asdiscussed in International Application No. PCT/GB2014/050175 (publishedas WO2014/135838). Any configuration of one or more helicases and one ormore spacers disclosed in the International Application may be used inthis invention.

When a part of the polynucleotide enters the pore and moves through thepore along the field resulting from the applied potential, the one ormore helicases are moved past the spacer by the pore as thepolynucleotide moves through the pore. This is because thepolynucleotide (including the one or more spacers) moves through thepore and the one or more helicases remain on top of the pore.

The one or more spacers are preferably part of the polynucleotide, forinstance they interrupt(s) the polynucleotide sequence. The one or morespacers are preferably not part of one or more blocking molecules, suchas speed bumps, hybridised to the polynucleotide.

There may be any number of spacers in the polynucleotide, such as 1, 2,3, 4, 5, 6, 7, 8, 9, 10 or more spacers. There are preferably two, fouror six spacers in the polynucleotide. The one or more spacers arepreferably in the Y adaptor or leader sequence. There may be one or morespacers in different regions of the polynucleotide, such as one or morespacers in the Y adaptor and/or hairpin loop adaptor.

The one or more spacers each provides an energy barrier which the one ormore helicases cannot overcome even in the active mode. The one or morespacers may stall the one or more helicases by reducing the traction ofthe helicase (for instance by removing the bases from the nucleotides inthe polynucleotide) or physically blocking movement of the one or morehelicases (for instance using a bulky chemical group).

The one or more spacers may comprise any molecule or combination ofmolecules that stalls the one or more helicases. The one or more spacersmay comprise any molecule or combination of molecules that prevents theone or more helicases from moving along the polynucleotide. It isstraightforward to determine whether or not the one or more helicasesare stalled at one or more spacers in the absence of a transmembranepore and an applied potential. For instance, the ability of a helicaseto move past a spacer and displace a complementary strand of DNA can bemeasured by PAGE.

The one or more spacers typically comprise a linear molecule, such as apolymer. The one or more spacers typically have a different structurefrom the polynucleotide. For instance, if the polynucleotide is DNA, theone or more spacers are typically not DNA. In particular, if thepolynucleotide is deoxyribonucleic acid (DNA) or ribonucleic acid (RNA),the one or more spacers preferably comprise peptide nucleic acid (PNA),glycerol nucleic acid (GNA), threose nucleic acid (TNA), locked nucleicacid (LNA) or a synthetic polymer with nucleotide side chains. The oneor more spacers may comprise one or more nucleotides in the oppositedirection from the polynucleotide. For instance, the one or more spacersmay comprise one or more nucleotides in the 3′ to 5′ direction when thepolynucleotide is in the 5′ to 3′ direction. The nucleotides may be anyof those discussed above.

The one or more helicases may be stalled by (i.e. before) or on eachlinear molecule spacers. If linear molecule spacers are used, thepolynucleotide is preferably provided with a double stranded region ofpolynucleotide adjacent to the end of each spacer past which the one ormore helicases are to be moved. If linear molecule spacers are used, thepolynucleotide is preferably provided with a blocking molecule at theend of each spacer opposite to the end past which the one or morehelicases are to be moved. This can help to ensure that the one or morehelicases remain stalled on each spacer. It may also help retain the oneor more helicases on the polynucleotide in the case that it/theydiffuse(s) off in solution. The blocking molecule may be any of thechemical groups discussed below which physically cause the one or morehelicases to stall. The blocking molecule may be a double strandedregion of polynucleotide. The blocking molecule may be BNA.

The method may concern moving two or more helicases past a spacer. Insuch instances, the length of the spacer is typically increased toprevent the trailing helicase from pushing the leading helicase past thespacer in the absence of the pore and applied potential. If the methodconcerns moving two or more helicases past one or more spacers, thespacer lengths discussed above may be increased at least 1.5 fold, such2 fold, 2.5 fold or 3 fold. For instance, if the method concerns movingtwo or more helicases past one or more spacers, the spacer lengths maybe increased 1.5 fold, 2 fold, 2.5 fold or 3 fold.

Membrane

The pore used in the invention may be present in a membrane. In themethod of the invention, the polynucleotide is typically contacted withthe pore in a membrane. Any membrane may be used in accordance with theinvention. Suitable membranes are well-known in the art. The membrane ispreferably an amphiphilic layer. An amphiphilic layer is a layer formedfrom amphiphilic molecules, such as phospholipids, which have bothhydrophilic and lipophilic properties. The amphiphilic molecules may besynthetic or naturally occurring. Non-naturally occurring amphiphilesand amphiphiles which form a monolayer are known in the art and include,for example, block copolymers (Gonzalez-Perez et al., Langmuir, 2009,25, 10447-10450). Block copolymers are polymeric materials in which twoor more monomer sub-units that are polymerized together to create asingle polymer chain. Block copolymers typically have properties thatare contributed by each monomer sub-unit. However, a block copolymer mayhave unique properties that polymers formed from the individualsub-units do not possess. Block copolymers can be engineered such thatone of the monomer sub-units is hydrophobic (i.e. lipophilic), whilstthe other sub-unit(s) are hydrophilic whilst in aqueous media. In thiscase, the block copolymer may possess amphiphilic properties and mayform a structure that mimics a biological membrane. The block copolymermay be a diblock (consisting of two monomer sub-units), but may also beconstructed from more than two monomer sub-units to form more complexarrangements that behave as amphipiles. The copolymer may be a triblock,tetrablock or pentablock copolymer. The membrane is preferably atriblock copolymer membrane.

The membrane is most preferably one of the membranes disclosed inInternational Application No. PCT/GB2013/052766 or PCT/GB2013/052767.

The amphiphilic molecules may be chemically-modified or functionalisedto facilitate coupling of the polynucleotide.

Coupling

The/each modified polynucleotide is preferably coupled to the membranecomprising the pore. The method may comprise coupling the/eachpolynucleotide to the membrane comprising the pore. The polynucleotideis preferably coupled to the membrane using one or more anchors. Thepolynucleotide may be coupled to the membrane using any known method.

Each anchor comprises a group which couples (or binds) to thepolynucleotide and a group which couples (or binds) to the membrane.Each anchor may covalently couple (or bind) to the polynucleotide and/orthe membrane. The polynucleotide is preferably coupled to the membraneusing the Y adaptor or leader sequence and/or the hairpin loop.

The polynucleotide may be coupled to the membrane using any number ofanchors, such as 2, 3, 4 or more anchors. For instance, a polynucleotidemay be coupled to the membrane using two anchors each of whichseparately couples (or binds) to both the polynucleotide and membrane.

The one or more anchors may comprise the one or more helicases and/orthe one or more molecular brakes.

If the membrane is an amphiphilic layer, such as a copolymer membrane ora lipid bilayer, the one or more anchors preferably comprise apolypeptide anchor present in the membrane and/or a hydrophobic anchorpresent in the membrane. The hydrophobic anchor is preferably a lipid,fatty acid, sterol, carbon nanotube, polypeptide, protein or amino acid,for example cholesterol, palmitate or tocopherol. In preferredembodiments, the one or more anchors are not the pore.

The components of the membrane, such as the amphiphilic molecules,copolymer or lipids, may be chemically-modified or functionalised toform the one or more anchors. Examples of suitable chemicalmodifications and suitable ways of functionalising the components of themembrane are discussed in more detail below. Any proportion of themembrane components may be functionalized, for example at least 0.01%,at least 0.1%, at least 1%, at least 10%, at least 25%, at least 50% or100%.

The polynucleotide may be coupled directly to the membrane. The one ormore anchors used to couple the polynucleotide to the membranepreferably comprise a linker. The one or more anchors may comprise oneor more, such as 2, 3, 4 or more, linkers. One linker may be used tocouple more than one, such as 2, 3, 4 or more, polynucleotides to themembrane.

Preferred linkers include, but are not limited to, polymers, such aspolynucleotides, polyethylene glycols (PEGs), polysaccharides andpolypeptides. These linkers may be linear, branched or circular. Forinstance, the linker may be a circular polynucleotide. Thepolynucleotide may hybridise to a complementary sequence on the circularpolynucleotide linker.

The use of a linker is preferred in the sequencing embodiments discussedbelow. If a polynucleotide is permanently coupled directly to themembrane in the sense that it does not uncouple when interacting withthe pore (i.e. does not uncouple in step (b) or (e)), then some sequencedata will be lost as the sequencing run cannot continue to the end ofthe polynucleotide due to the distance between the membrane and thepore. If a linker is used, then the polynucleotide can be processed tocompletion.

The coupling may be permanent or stable. In other words, the couplingmay be such that the polynucleotide remains coupled to the membrane wheninteracting with the pore.

The coupling may be transient. In other words, the coupling may be suchthat the polynucleotide may decouple from the membrane when interactingwith the pore.

Suitable methods of coupling are disclosed in International ApplicationNo. PCT/GB12/051191 (published as WO 2012/164270) and UK Application No.1406155.0.

Uncoupling

The method of the invention may involve characterising multiple modifieddouble stranded polynucleotides and uncoupling of the at least the firstmodified double stranded polynucleotide.

In a preferred embodiment, the invention involves characterising two ormore modified double stranded polynucleotides. The method comprises:

-   -   (a) providing a first modified double stranded polynucleotide in        a first sample;    -   (b) providing a second modified double stranded polynucleotide        in a second sample;    -   (c) coupling the first polynucleotide in the first sample to a        membrane using one or more anchors;    -   (d) contacting the first polynucleotide with a transmembrane        pore such that at least one strand of the first polynucleotide        moves through the pore;    -   (e) taking one or more measurements as the first polynucleotide        moves with respect to the pore wherein the measurements are        indicative of one or more characteristics of the first        polynucleotide and thereby characterising the first        polynucleotide;    -   (f) uncoupling the first polynucleotide from the membrane;    -   (g) coupling the second polynucleotide in the second sample to        the membrane using one or more anchors;    -   (h) contacting the second polynucleotide with the pore such that        at least one strand of the second polynucleotide moves through        the pore; and    -   (i) taking one or more measurements as the second polynucleotide        moves with respect to the pore wherein the measurements are        indicative of one or more characteristics of the second        polynucleotide and thereby characterising the second        polynucleotide.

This type of method is discussed in detail in the UK Application No.1406155.0.

Other Characterisation Method

In another embodiment, the/each modified double stranded polynucleotideis characterised by detecting labelled species that are released as apolymerase incorporates nucleotides into the polynucleotide. Thepolymerase uses the polynucleotide as a template. Each labelled speciesis specific for each nucleotide. The/each polynucleotide is contactedwith a transmembrane pore, a polymerase and labelled nucleotides suchthat phosphate labelled species are sequentially released whennucleotides are added to the polynucleotide(s) by the polymerase,wherein the phosphate species contain a label specific for eachnucleotide. The polymerase may be any of those discussed above. Thephosphate labelled species are detected using the pore and therebycharacterising the polynucleotide. This type of method is disclosed inEuropean Application No. 13187149.3 (published as EP 2682460). Any ofthe embodiments discussed above equally apply to this method.

Kits

The present invention also provides a kit for modifying a templatepolynucleotide. The kit comprises (a) a population of MuA substrates ofthe invention and (b) a MuA transposase and (c) a polymerase. Any of theembodiments discussed above with reference to the methods and productsof the invention equally apply to the kits.

The kit may further comprise the components of a membrane, such as thecomponents of an amphiphilic layer or a lipid bilayer. The kit mayfurther comprise a transmembrane pore or the components of atransmembrane pore. The kit may further comprise a polynucleotidebinding protein. Suitable membranes, pores and polynucleotide bindingproteins are discussed above.

The kit of the invention may additionally comprise one or more otherreagents or instruments which enable any of the embodiments mentionedabove to be carried out. Such reagents or instruments include one ormore of the following: suitable buffer(s) (aqueous solutions), means toobtain a sample from a subject (such as a vessel or an instrumentcomprising a needle), means to amplify and/or express polynucleotides, amembrane as defined above or voltage or patch clamp apparatus. Reagentsmay be present in the kit in a dry state such that a fluid sampleresuspends the reagents. The kit may also, optionally, compriseinstructions to enable the kit to be used in the method of the inventionor details regarding which patients the method may be used for. The kitmay, optionally, comprise nucleotides.

The following Example illustrates the invention.

Example 1

This example describes a method for modifying a template double strandedpolynucleotide, especially for characterisation using nanoporesequencing. This example shows that MuA transposase was able to insertMuA substrates which contained a hairpin loop. The gaps in the constructwere then filled using a polymerase and the double-stranded constructwas then heated in order to melt the double-stranded DNA. This resultedin single-stranded DNA which had a hairpin from which a polymeraseproduced a complement. This construct was then ligated to an adaptorwith a pre-bound enzyme and finally hybridised to a tether. This DNAconstruct then exhibited helicase controlled DNA movement through ananopore.

Materials and Methods 1.1—Fragmentation of the DNA Template Using theMuA Transposase

The MuA adapter X that was used in this example had a 5′ 21 bp hairpin(adapter labelled c in FIG. 2 , upper strand=SEQ ID NO: 29, lowerstrand=SEQ ID NO: 30 attached at its 5′ end to the 3′ end of thesequence GATCU). The upper and lower strands of the adapter wereannealed at 10 uM, from 95° C. at 2° C. min-1, in 10 mM Tris pH 7.5, 50mM NaCl.

The MuA fragmentation reactions (10 μL) were set up as described inTable 1 below and incubated for 1 hour at 30° C. The MuA enzyme was thenheat inactivated by heating at 75° C. for 15 minutes. Finally, theresultant DNA was 1.5×SPRI purified and eluted in nuclease free water(42 μL, Sample 1).

TABLE 1 Reagent Components Lambda DNA (SEQ ID 50 ng μL⁻¹ NO: 31) MuAadapter X 200 nM MuA (Thermo) 50 nM Buffer 25 mM Tris-HCl pH 8.0, 10 mMMgCl2, 110 mM NaCl, 0.05% Triton X-100, 10% glycerol1.2—Incubation of the DNA Template with DNA Polymerase

Following the MuA fragmentation procedure, the purified DNA was thenincubated with a DNA polymerase in order to copy the upper strandhairpin.

The DNA polymerase reactions (50 uL) were set up as described in Table 2below and incubated at 68° C. for 10 minutes. Finally, the resultant DNAwas 1.5x SPRI purified and eluted in nuclease free water (42 μL, Sample2).

TABLE 2 Reagent Components Sample 1 39 μl 10× ThermoPol Buffer 5 μl of200 mM Tris-HCl pH 8.8, 100 mM (NH₄)₂SO₄, 100 mM KCl, 20 mM MgSO₄, 1.0%Triton ® X-100 9° N Polymerase (NEB) 5 μL dNTPs 1 μL of 10 mM (Finalconcentration 0.2 mM)

1.3-Heat Denaturation and Polymerase Fill-In

Following the hairpin copying stage sample 2 was then treated to asingle denaturation step and polymerase fill-in. For the polymerasefill-in reaction the polymerase was provided with dCTP/dGTP/dATP but thestandard dTTP was replaced with a different nucleotide species5-propynyl-dU. The reactions (50) were set up as described in Table 3below and incubated for 2 minutes at 95° C., 30 seconds at 55° C. and 30minutes at 68′° C. Finally, the resultant DNA was 1.5×SPRI purified andeluted in nuclease free water (45 μL, Sample 3).

TABLE 3 Reagent Components Sample 2 39 μl 10× ThermoPol Buffer 5 μl of200 mM Tris-HCl pH 8.8, 100 mM (NH₄)₂SO₄, 100 mM KCl, 20 mM MgSO₄, 1.0%Triton ® X-100 9° N Polymerase (NEB) 5 μL dNTPs - (5-propynyl-dU/dCTP/ 1μL of 10 mM (Final concentration dGTP/dATP) 0.2 mM)

1.4—dA Tailing Reaction

Sample 3 was then dA-tailed as described in Table 4 below and incubatedfor 30 minutes at 37° C. The resultant DNA was 1.5x SPRI purified andeluted in nuclease free water (20 μL, Sample 4).

TABLE 4 Reagent Components Sample 3 42 μl 10× NEB dA-tailing buffer 5 μlKlenow exo⁻ 3 μL1.5—Ligation of Adapter with Pre-Loaded Enzyme

Sample 4 was then ligated to Y-adapter 1 (upper strand=20 iSpC3 spacersattached at the 3′ end to SEQ ID NO: 32 which was attached at the 3′ endto four iSp18 spacers which were attached at the 3′ end to SEQ ID NO:33, bottom strand=SEQ ID NO: 34 which had a 5′ phosphate attached) withpre-loaded enzyme (T4 Dda-E94C/A360C/C109A/C136A (SEQ ID NO: 24 withmutations E94C/A360C/C114A/C171A/C421D and then (ΔM1)G1G2)) as describedin Table 5 below and incubated for 20 minutes at room temperature. Theresultant DNA was 0.4x SPRI purified and washed with buffer (200 μL of750 mM NaCl, 10% PEG 8000.50 mM Tris.HCl pH8) and eluted in buffer (20μL of 40 mM CAPS pH 10, 40 mM KCl Sample 5).

TABLE 5 Reagent Components Sample 4 20 μl Y-adapter 1 5 μl NEB Blunt TAMM (2×) 25 μL

1.6—Annealing of Tether

The DNA analytes present in Sample 5 were then annealed to a tether.Sample 5 was incubated with the DNA tether (the sequence

AACAACCTwas attached at its 5′ end to three iSp18 spacers, two thymines and a 5′cholesterol TEG and the sequence

AACAACCTwas attached at its 3′ end to three iSp18 spacers which are attached atthe 3′ end to SEQ ID NO: 35), 500 nM, 5 μL) for 10 minutes at roomtemperature. The resultant sample was known as Sample 6.

1.7—Electrophysiology Testing

Prior to setting up the experiment, DNA sample 6 (a quarter of the totalvolume of sample 6) was added to buffer (25 mM Potassium Phosphatebuffer (pH 7.5), 500 mM KCl), MgCl2 (1 mM) and ATP (2 mM) which gave atotal volume of 150 μL.

Electrical measurements were acquired from single MspA nanoporesinserted in block co-polymer in buffer (25 mM K Phosphate buffer, 150 mMPotassium Ferrocyanide (II), and 150 mM Potassium Ferricyanide (II), pH8.0). After achieving a single pore inserted in the block co-polymer,then buffer (2 mL, 25 mM K Phosphate buffer, 150 mM PotassiumFerrocyanide (11), 150 mM Potassium Ferricyanide (111), pH 8.0) wasflowed through the system to remove any excess MspA nanopores. Theenzyme (T4 Dda-E94C/C109A/C136A/A360C, 10 nM final concentration). DNAsample 6 and fuel (MgCl2 2 mM final concentration, ATP 2 mM finalconcentration) pre-mix (150 μL total) was then flowed into the singlenanopore experimental system. The experiment was run at 120 mV andhelicase-controlled DNA movement monitored for 6 hours.

Results

Helicase controlled DNA movement of the DNA produced at the end of thesample preparation procedure (sample 6) was observed. FIG. 3 shows anexample of a helicase controlled DNA movement.

The sample preparation procedure was also analysed using an Agilent12,000 DNA chip trace. If there was no pre-incubation at 68° C., priorto step 1.2 (where the 5′ hairpin was transcribed), then followingstrand dissociation (heat denaturation step 1.3) no synthetic complementwas made (shown as a dash/dotted line after step 4 of FIG. 2 ), as thestrands lacked the necessary 3′ hairpin from which the polymerase wasinitiated. This was observed in the Agilent 12,000 DNA chip trace shownin FIG. 4 , where the line labelled l was the untreated MuA fragmentedDNA input material, the line labelled 2 was the analyte that had the 68°C. incubation step (in 1.2 above) and subsequently had undergone all ofstep 1.3 and the line labelled 3 did not have the 68° C. incubation instep 1.2 but had undergone all of step 1.3. As such, for line 3 no dsDNAwas made and, therefore, a flat line (region labelled X) was observed onthe Agilent trace as there was no hairpin copied before stranddissociation at 95° C. However, for line 2 the hairpin was transcribedand as such on strand dissociation the polymerase initiated fill-in fromthe new 3′ hairpin. This meant that line 2 shown a peak in region Xwhich corresponded to the dsDNA product that was made from the copiedhairpin.

The above procedure was repeated as described above however in step 1.3the polymerase was provided with the standard DNAdNTP's-dCTP/dATP/dGTP/dTTP rather than 5-propynyl-dU, which produced DNASample 7. FIG. 6 shows example helicase-controlled DNA movement for DNASample 7 (which was produced using standard DNA dNTP's in step 1.3). Thesample preparation procedure was successful and helicase controlled DNAmovements were observed for this sample.

Example 2

This example describes a method for modifying a template double strandedpolynucleotide, especially for characterisation using nanoporesequencing. FIG. 7 shows a cartoon representation of the samplepreparation steps described in steps 2.1 and 2.2 below. This exampleshows that MuA transposase was able to insert MuA substrates whichcontained a hairpin loop which contained analogues of dG and dC in thehairpin loop of the MuA adapter (dG was replaced with deoxyinosine anddC was replaced deoxyzebularine). The gaps in the construct were thenfilled using a polymerase which replaced the overhang strands with newstrands which complemented the strands comprising the hairpin loops. Thenew strand, which complemented the strand comprising the hairpin loop,was also capable of forming a hairpin loop. The hairpin loop of the newstrand had a higher Tm than the double stranded region formed betweenthe complementary strand and the hairpin loop which was made of A/T/Z/I(labelled 1× in FIG. 7 ). Therefore, the hairpin in the new strandformed (labelled f2h in FIG. 7 ) and the hairpin loop made of A/T/Z/Ialso formed (labelled f1h in FIG. 7 ). The polymerase then used thehairpin loop as a primer to make the complementary strands. Therefore,there was no need for an additional heating step to separate the dsDNAconstruct produced after step 2 of FIG. 7 .

2.1—Fragmentation of the DNA Template Using the MuA Transposase

The MuA adapter used in this example had a 5′ 7 bp hairpin with dGreplaced for dInosine and dC replaced for dZebularine. The upper strands(the modified polynucleotide sequence IZITAZ (where I is an deoxyinosineand Z is deoxyzebularine) is attached to the 5′ end of non-modifiedpolynucleotide sequence TTTTTA which is attached at the 3′ end to themodified polynucleotide sequence ITAZIZ (where I is an deoxyinosine andZ is deoxyzebularine) which is attached to the 5′ end of SEQ ID NO: 39)and lower strands (SEQ ID NO: 38) of the adapter P were annealed at 10uM, from 95° C. at 2° C. min-1, in 10 mM Tris pH 7.5, 50 mM NaCl.

The MuA fragmentation reactions (10 μL) were set up as described inTable 1 above using adapter P instead of adapter X and incubated for 1hour at 30° C. The MuA enzyme was then heat inactivated by heating at75° C. for 15 minutes. Finally, the resultant DNA was 1.5x SPRI purifiedand eluted in nuclease free water (42 μL).

2.2—Incubation of the DNA Template with DNA Polymerase

Following the MuA fragmentation procedure, the purified DNA was thenincubated with a DNA polymerase in order to copy the upper strandhairpin (which had GIC replaced with I/Z).

It was during this step that the new strand, which complemented thestrand comprising the hairpin loop, formed a hairpin loop. This wasowing to the fact that the hairpin loop formed by the new strand had ahigher Tm than the double-stranded region formed between thecomplementary strand and the hairpin loop which contained analogues ofdZ and dI. Therefore, there was no need to heat the double-stranded DNA,to separate it into ssDNA, as the hairpin loop with the higher Tm wasformed preferentially and the polymerase then used this hairpin loop asa primer to make the complementary strand.

The DNA polymerase reaction (50 uL) was set up as described in the Tablebelow and incubated at 37° C. for 30 minutes. Finally, the resultant DNAwas 1.5x SPRI purified and eluted in nuclease free water (42 μL).

TABLE 6 Reagent Components Sample 1 39 μl 10× NEBuffer 5 μl of 100 mMTris-HCl (pH 7.9), 500 mM NaCl₂, 100 mM MgCl₂, 10 mM DTT Klenow Fragment(NEB) 2.5 μL SSB (Promega) 2.5 μL dNTPs 1 μL of 10 mM (Finalconcentration 0.2 mM)

This strand could be further modified by dA tailing, ligating an adapterwith an enzyme pre-loaded and hybridising a tether (as described inExample 1.4-1.6) in order to produce a strand which could becharacterised using a nanopore system (as described in Example 1.7).

1. A method for modifying a template double stranded polynucleotide,comprising: (a) contacting the template polynucleotide with a MuAtransposase and a population of double stranded MuA substrates eachcomprising (i) at least one overhang and (ii) at least one hairpin loopin the opposite strand from the strand comprising the at least oneoverhang such that the transposase fragments the template polynucleotideand ligates a substrate to one or both ends of the double strandedfragments and thereby produces a plurality of fragment/substrateconstructs; (b) contacting the fragment/substrate constructs with apolymerase such that the polymerase displaces the strands comprising theoverhangs and replaces them with strands which complement the strandscomprising the hairpin loops and thereby produces a plurality of doublestranded constructs each comprising a double stranded fragment of thetemplate polynucleotide; and (c) separating the two strands of thedouble stranded constructs and using the strands as templates to form aplurality of modified double stranded polynucleotides each comprisingtwo complementary strands linked by at least one hairpin loop.
 2. Amethod according to claim 1, wherein step (c) comprises separating thetwo strands of the double stranded constructs by increasing one or moreof pH, temperature and ionic strength.
 3. A method according to claim 1or 2, wherein step (c) comprises contacting the separated strands with apolymerase such that the polymerase uses the strands as templates toform the plurality of modified double stranded polynucleotides.
 4. Amethod according to claim 1 or 2, wherein step (c) comprises (i)contacting the plurality of separated strands with a population ofnucleotide oligomers which comprises every possible combination ofnucleotides which are complementary to all of the nucleotides in thestrands under conditions in which the oligomers are capable ofhybridising to the strands and (ii) ligating together those oligomersthat hybridise to the strands to form the plurality of modified doublestranded polynucleotides.
 5. A method according to claim 1, wherein step(c) comprises contacting the plurality of double stranded constructswith a polymerase such that the polymerase simultaneously separates thetwo strands of the double stranded constructs and uses the strands astemplates to the plurality of modified double stranded polynucleotides.6. A method according to any one of the preceding claims, wherein the atleast one hairpin loop does not link the two strands of each substrate.7. A method according to any one of the preceding claims, wherein the atleast one overhang is 4, 5 or 6 nucleotides in length.
 8. A methodaccording to any one of the preceding claims, wherein each substratecomprises a selectable binding moiety.
 9. A method according to any oneof the preceding claims, wherein each substrate comprises a leadersequence which is capable of preferentially threading through atransmembrane pore.
 10. A method according to any one of claims 1 to 8,wherein the method further comprises (d) attaching Y adaptors to theplurality of modified double stranded polynucleotides at the oppositeends from the hairpin loops and wherein the Y adaptors optionallycomprise a leader sequence which is capable of preferentially threadingthrough a transmembrane pore.
 11. A method according to any one of thepreceding claims, wherein the method further comprises binding one ormore polynucleotide binding proteins to the plurality of modified doublestranded polynucleotides.
 12. A plurality of modified double strandedpolynucleotides produced using a method according to any one of thepreceding claims.
 13. A population of double stranded polynucleotide MuAsubstrates for modifying a template polynucleotide, wherein thesubstrates are as defined in any one of claims 1, 6, 7 and
 8. 14. Amethod of characterising at least one polynucleotide modified using amethod according to any one of claims 1 to 11, comprising: a) contactingthe modified polynucleotide with a transmembrane pore such that at leastone strand of the polynucleotide moves through the pore; and b) takingone or more measurements as the at least one strand moves with respectto the pore wherein the measurements are indicative of one or morecharacteristics of the at least one strand and thereby characterisingthe modified polynucleotide.
 15. A method of characterising a templatepolynucleotide, comprising: a) modifying the template polynucleotideusing a method according to any one of claims 1 to 11 to produce aplurality of modified polynucleotides; b) contacting each modifiedpolynucleotide with a transmembrane pore such that at least one strandof each polynucleotide moves through the pore; and c) taking one or moremeasurements as each polynucleotide moves with respect to the porewherein the measurements are indicative of one or more characteristicsof each polynucleotide and thereby characterising the templatepolynucleotide.
 16. A method according to claim 14 or 15, wherein bothstrands of the/each modified polynucleotide move through the pore.
 17. Amethod according to any one of claims 11 to 13, wherein the one or morecharacteristics are selected from (i) the length of the polynucleotide,(ii) the identity of the polynucleotide, (iii) the sequence of thepolynucleotide, (iv) the secondary structure of the polynucleotide and(v) whether or not the polynucleotide is modified.
 18. A methodaccording to any one of claims 14 to 17, wherein contacting step (a) orcontacting step (b) further comprises contacting the/each modifiedpolynucleotide with a polynucleotide binding protein such that theprotein controls the movement of the/each polynucleotide through thepore.
 19. A method according to claim 18, wherein the method comprises(i) contacting the/each modified polynucleotide with a transmembranepore and one or more polynucleotide binding proteins such that at leastone strand of the/each polynucleotide moves through the pore and the oneor more proteins control the movement of the/each polynucleotide throughthe pore; and (ii) measuring the current passing through the pore asthe/each polynucleotide moves with respect to the pore wherein thecurrent is indicative of one or more characteristics of the/eachpolynucleotide and thereby characterising the polynucleotide.
 20. Amethod according to claim 19, wherein the one or more polynucleotidebinding proteins are bound to the/each modified polynucleotide before itis/they are contacted with the transmembrane pore.
 21. A methodaccording to claim 19 or 20, wherein the one or more polynucleotidebinding proteins are derived from a helicase.
 22. A kit for modifying atemplate double stranded polynucleotide comprising (a) a population ofMuA substrates as defined in any one of claims 1, 6, 7 and 8, (b) a MuAtransposase and (c) a polymerase.